Installation¶

Install the current PyPI release:
```
$ pip install metacluster==1.3.0
```

Install directly from source code:

$ git clone https://github.com/thieu1995/metacluster.git
$ cd metacluster
$ python setup.py install

In case, you want to install the development version from Github:
```
$ pip install git+https://github.com/thieu1995/permetrics
```

After installation, you can import MetaCluster as any other Python module:

$ python
>>> import metacluster
>>> metacluster.__version__

Examples¶

Let’s go through some examples.

First, load dataset. You can use the available datasets from MetaCluster:

# Load available dataset from MetaCluster
from metacluster import get_dataset

# Try unknown data
get_dataset("unknown")
# Enter: 1      -> This wil list all of avaialble dataset

data = get_dataset("Arrhythmia")

Load your own dataset if you want:

import pandas as pd
from metacluster import Data

# load X and y
# NOTE MetaCluster accepts numpy arrays only, hence use the .values attribute
dataset = pd.read_csv('examples/dataset.csv', index_col=0).values
X, y = dataset[:, 0:-1], dataset[:, -1]
data = Data(X, y, name="my-dataset")        # Set up the name for dataset as saved path of model

Next, scale your features:

# MinMaxScaler
data.X, scaler = data.scale(data.X, method="MinMaxScaler", feature_range=(0, 1))

# StandardScaler
data.X, scaler = data.scale(data.X, method="StandardScaler")

# MaxAbsScaler
data.X, scaler = data.scale(data.X, method="MaxAbsScaler")

# RobustScaler
data.X, scaler = data.scale(data.X, method="RobustScaler")

# Normalizer
data.X, scaler = data.scale(data.X, method="Normalizer", norm="l2")   # "l1" or "l2" or "max"

Next, select Metaheuristic Algorithm, Its parameters, list of objectives, and list of performance metrics:

list_optimizer = ["BaseFBIO", "OriginalGWO", "OriginalSMA"]
list_paras = [
    {"name": "FBIO", "epoch": 10, "pop_size": 30},
    {"name": "GWO", "epoch": 10, "pop_size": 30},
    {"name": "SMA", "epoch": 10, "pop_size": 30}
]
list_obj = ["SI", "RSI"]
list_metric = ["BHI", "DBI", "DI", "CHI", "SSEI", "NMIS", "HS", "CS", "VMS", "HGS"]

You can check all supported metaheuristic algorithms from: Mealpy Link. All supported clustering objectives and metrics from: Permetrics Link.

If you don’t want to read the documents, you can print out all of the supported information by:

from metacluster import MetaCluster

# Get all supported methods and print them out
MetaCluster.get_support(name="all")

Next, create an instance of MetaCluster class and run it:

model = MetaCluster(list_optimizer=list_optimizer, list_paras=list_paras, list_obj=list_obj, n_trials=3)

model.execute(data=data, cluster_finder="elbow", list_metric=list_metric, save_path="history", verbose=False)

model.save_boxplots()
model.save_convergences()

As you can see, you can define different datasets and using the same model to run it. Remember to set the name to your dataset, because the folder that hold your results is the name of your dataset.

Visualization¶

If you set save_figures=True in the 4th step, you can get a lots of figures that automatically saved in the save_path.

Also you will get a lots of csv file like this.

Note that, there are two special files which are result_convergences.csv and result_labels.csv. You will see a lots of symbol = in these files.

We did that intentionally because we need to save all fitness value after N epochs as fitness column and all labels of predicted X as y_pred column. So it will be easier for users when they read these csv files.

Here is an simple example how to read these files:

import pandas as pd
df = pd.read_csv("path_save/result_convergences.csv")

# I want to get the loss convergence of model FBIO, with objective BHI and at the 1st trial
res = df[(df["optimizer"]=="FBIO") & (df["obj"]=="BHI")]["fitness"].values
list_convergences = np.array(res.split("="), dtype=float)