Installation

  • Install the current PyPI release:

    $ pip install metacluster==1.3.0
    
  • Install directly from source code:

    $ git clone https://github.com/thieu1995/metacluster.git
    $ cd metacluster
    $ python setup.py install
    
  • In case, you want to install the development version from Github:

    $ pip install git+https://github.com/thieu1995/permetrics
    

After installation, you can import MetaCluster as any other Python module:

$ python
>>> import metacluster
>>> metacluster.__version__

Examples

Let’s go through some examples.

  1. First, load dataset. You can use the available datasets from MetaCluster:

    # Load available dataset from MetaCluster
    from metacluster import get_dataset
    
    # Try unknown data
    get_dataset("unknown")
    # Enter: 1      -> This wil list all of avaialble dataset
    
    data = get_dataset("Arrhythmia")
    

Load your own dataset if you want:

import pandas as pd
from metacluster import Data

# load X and y
# NOTE MetaCluster accepts numpy arrays only, hence use the .values attribute
dataset = pd.read_csv('examples/dataset.csv', index_col=0).values
X, y = dataset[:, 0:-1], dataset[:, -1]
data = Data(X, y, name="my-dataset")        # Set up the name for dataset as saved path of model
  1. Next, scale your features:

    # MinMaxScaler
    data.X, scaler = data.scale(data.X, method="MinMaxScaler", feature_range=(0, 1))
    
    # StandardScaler
    data.X, scaler = data.scale(data.X, method="StandardScaler")
    
    # MaxAbsScaler
    data.X, scaler = data.scale(data.X, method="MaxAbsScaler")
    
    # RobustScaler
    data.X, scaler = data.scale(data.X, method="RobustScaler")
    
    # Normalizer
    data.X, scaler = data.scale(data.X, method="Normalizer", norm="l2")   # "l1" or "l2" or "max"
    
  2. Next, select Metaheuristic Algorithm, Its parameters, list of objectives, and list of performance metrics:

    list_optimizer = ["BaseFBIO", "OriginalGWO", "OriginalSMA"]
    list_paras = [
        {"name": "FBIO", "epoch": 10, "pop_size": 30},
        {"name": "GWO", "epoch": 10, "pop_size": 30},
        {"name": "SMA", "epoch": 10, "pop_size": 30}
    ]
    list_obj = ["SI", "RSI"]
    list_metric = ["BHI", "DBI", "DI", "CHI", "SSEI", "NMIS", "HS", "CS", "VMS", "HGS"]
    

You can check all supported metaheuristic algorithms from: Mealpy Link. All supported clustering objectives and metrics from: Permetrics Link.

If you don’t want to read the documents, you can print out all of the supported information by:

from metacluster import MetaCluster

# Get all supported methods and print them out
MetaCluster.get_support(name="all")
  1. Next, create an instance of MetaCluster class and run it:

    model = MetaCluster(list_optimizer=list_optimizer, list_paras=list_paras, list_obj=list_obj, n_trials=3)
    
    model.execute(data=data, cluster_finder="elbow", list_metric=list_metric, save_path="history", verbose=False)
    
    model.save_boxplots()
    model.save_convergences()
    

As you can see, you can define different datasets and using the same model to run it. Remember to set the name to your dataset, because the folder that hold your results is the name of your dataset.

Visualization

If you set save_figures=True in the 4th step, you can get a lots of figures that automatically saved in the save_path.

../_images/boxplot-BHI-BRI.png ../_images/boxplot-BHI-DBI.png ../_images/boxplot-BHI-DI.png ../_images/boxplot-BHI-DRI.png ../_images/convergence-BHI-1.png ../_images/convergence-MIS-1.png

Also you will get a lots of csv file like this.

../_images/result_convergences.png ../_images/result_labels.png ../_images/result_metrics.png ../_images/result_metrics_mean.png ../_images/result_metrics_std.png

Note that, there are two special files which are result_convergences.csv and result_labels.csv. You will see a lots of symbol = in these files.

We did that intentionally because we need to save all fitness value after N epochs as fitness column and all labels of predicted X as y_pred column. So it will be easier for users when they read these csv files.

Here is an simple example how to read these files:

import pandas as pd
df = pd.read_csv("path_save/result_convergences.csv")

# I want to get the loss convergence of model FBIO, with objective BHI and at the 1st trial
res = df[(df["optimizer"]=="FBIO") & (df["obj"]=="BHI")]["fitness"].values
list_convergences = np.array(res.split("="), dtype=float)