============
Installation
============
* Install the `current PyPI release `_::
$ pip install metacluster==1.3.0
* Install directly from source code::
$ git clone https://github.com/thieu1995/metacluster.git
$ cd metacluster
$ python setup.py install
* In case, you want to install the development version from Github::
$ pip install git+https://github.com/thieu1995/permetrics
After installation, you can import MetaCluster as any other Python module::
$ python
>>> import metacluster
>>> metacluster.__version__
========
Examples
========
Let's go through some examples.
1. First, load dataset. You can use the available datasets from MetaCluster::
# Load available dataset from MetaCluster
from metacluster import get_dataset
# Try unknown data
get_dataset("unknown")
# Enter: 1 -> This wil list all of avaialble dataset
data = get_dataset("Arrhythmia")
Load your own dataset if you want::
import pandas as pd
from metacluster import Data
# load X and y
# NOTE MetaCluster accepts numpy arrays only, hence use the .values attribute
dataset = pd.read_csv('examples/dataset.csv', index_col=0).values
X, y = dataset[:, 0:-1], dataset[:, -1]
data = Data(X, y, name="my-dataset") # Set up the name for dataset as saved path of model
2. Next, scale your features::
# MinMaxScaler
data.X, scaler = data.scale(data.X, method="MinMaxScaler", feature_range=(0, 1))
# StandardScaler
data.X, scaler = data.scale(data.X, method="StandardScaler")
# MaxAbsScaler
data.X, scaler = data.scale(data.X, method="MaxAbsScaler")
# RobustScaler
data.X, scaler = data.scale(data.X, method="RobustScaler")
# Normalizer
data.X, scaler = data.scale(data.X, method="Normalizer", norm="l2") # "l1" or "l2" or "max"
3. Next, select Metaheuristic Algorithm, Its parameters, list of objectives, and list of performance metrics::
list_optimizer = ["BaseFBIO", "OriginalGWO", "OriginalSMA"]
list_paras = [
{"name": "FBIO", "epoch": 10, "pop_size": 30},
{"name": "GWO", "epoch": 10, "pop_size": 30},
{"name": "SMA", "epoch": 10, "pop_size": 30}
]
list_obj = ["SI", "RSI"]
list_metric = ["BHI", "DBI", "DI", "CHI", "SSEI", "NMIS", "HS", "CS", "VMS", "HGS"]
You can check all supported metaheuristic algorithms from: `Mealpy Link `_.
All supported clustering objectives and metrics from: `Permetrics Link `_.
If you don't want to read the documents, you can print out all of the supported information by::
from metacluster import MetaCluster
# Get all supported methods and print them out
MetaCluster.get_support(name="all")
4. Next, create an instance of MetaCluster class and run it::
model = MetaCluster(list_optimizer=list_optimizer, list_paras=list_paras, list_obj=list_obj, n_trials=3)
model.execute(data=data, cluster_finder="elbow", list_metric=list_metric, save_path="history", verbose=False)
model.save_boxplots()
model.save_convergences()
As you can see, you can define different datasets and using the same model to run it.
**Remember to set the name to your dataset**, because the folder that hold your results is the name of your dataset.
=============
Visualization
=============
If you set `save_figures=True` in the 4th step, you can get a lots of figures that automatically saved in the `save_path`.
.. image:: /_static/images/boxplot-BHI-BRI.png
:width: 49 %
.. image:: /_static/images/boxplot-BHI-DBI.png
:width: 49 %
.. image:: /_static/images/boxplot-BHI-DI.png
:width: 49 %
.. image:: /_static/images/boxplot-BHI-DRI.png
:width: 49 %
.. image:: /_static/images/convergence-BHI-1.png
:width: 49 %
.. image:: /_static/images/convergence-MIS-1.png
:width: 49 %
Also you will get a lots of csv file like this.
.. image:: /_static/images/result_convergences.png
.. image:: /_static/images/result_labels.png
.. image:: /_static/images/result_metrics.png
.. image:: /_static/images/result_metrics_mean.png
.. image:: /_static/images/result_metrics_std.png
Note that, there are two special files which are `result_convergences.csv` and `result_labels.csv`. You will see a lots of symbol `=` in these files.
We did that intentionally because we need to save all fitness value after N epochs as `fitness` column and all labels of predicted X as `y_pred` column.
So it will be easier for users when they read these csv files.
Here is an simple example how to read these files::
import pandas as pd
df = pd.read_csv("path_save/result_convergences.csv")
# I want to get the loss convergence of model FBIO, with objective BHI and at the 1st trial
res = df[(df["optimizer"]=="FBIO") & (df["obj"]=="BHI")]["fitness"].values
list_convergences = np.array(res.split("="), dtype=float)
.. toctree::
:maxdepth: 4
.. toctree::
:maxdepth: 4
.. toctree::
:maxdepth: 4