MetaCluster Library

metacluster.metacluster module

class metacluster.metacluster.MetaCluster(list_optimizer=None, list_paras=None, list_obj=None, n_trials=5, seed=20)[source]

Bases: object

Defines a MetaCluster class that hold all Metaheuristic-based K-Center Clustering methods

Parameters
  • list_optimizer (list, tuple, default = None) – List of strings that represent class optimizer or list of instance of Optimizer class from Mealpy library. Current supported optimizers, please check it here: https://github.com/thieu1995/mealpy If a custom optimizer is passed, make sure it is an instance of Optimizer class. Please use this to get supported optimizers: MetaCluster.get_support(name=”optimizer”)

  • list_paras (list, tuple, default=None) – List of dictionaries that present the parameters of each Optimizer class. You can set it to None to use all of default parameters in Mealpy library.

  • list_obj (list, tuple, default=None) – List of strings that represent objective name. Current supported objectives, please check it here: https://github.com/thieu1995/permetrics Please use this to get supported objectives: MetaCluster.get_support(name=”obj”)

  • n_trials (int, default=5) – The number of runs for each optimizer for each objective

  • seed (int, default=20) – Determines random number generation for the whole program. Use an int to make the randomness deterministic.

Examples

The following example shows how to use the most informative features in the MhaSelector FS method

>>> from metacluster import get_dataset, MetaCluster
>>> from sklearn.preprocessing import MinMaxScaler
>>>
>>> scaler = MinMaxScaler(feature_range=(0, 1))
>>> data = get_dataset("aniso")
>>> data.X = scaler.fit_transform(data.X)
>>>
>>> # Get all supported methods and print them out
>>> MetaCluster.get_support(name="all")
>>>
>>> list_optimizer = ["BaseFBIO", "OriginalGWO", "OriginalSMA"]
>>> list_paras = [
>>>    {"name": "FBIO", "epoch": 10, "pop_size": 30},
>>>    {"name": "GWO", "epoch": 10, "pop_size": 30},
>>>    {"name": "SMA", "epoch": 10, "pop_size": 30}
>>> ]
>>> list_obj = ["BHI", "MIS", "XBI"]
>>> list_metric = ["BRI", "DBI", "DRI", "DI", "KDI"]
>>> model = MetaCluster(list_optimizer=list_optimizer, list_paras=list_paras, list_obj=list_obj, n_trials=3)
>>> model.execute(data=data, cluster_finder="elbow", list_metric=list_metric, save_path="history", verbose=False)
>>> model.save_boxplots()
>>> model.save_convergences()
FILENAME_CONVERGENCES = 'result_convergences'
FILENAME_LABELS = 'result_labels'
FILENAME_METRICS = 'result_metrics'
FILENAME_METRICS_MEAN = 'result_metrics_mean'
FILENAME_METRICS_STD = 'result_metrics_std'
HYPHEN_SYMBOL = '='
SUPPORT = {'cluster_finder': {'all_majority': 'get_clusters_all_majority', 'all_max': 'get_clusters_all_max', 'all_mean': 'get_clusters_all_mean', 'all_min': 'get_clusters_all_min', 'bayesian_ìnormation': 'get_clusters_by_bic', 'calinski_harabasz': 'get_clusters_by_calinski_harabasz', 'davies_bouldin': 'get_clusters_by_davies_bouldin', 'elbow': 'get_clusters_by_elbow', 'gap': 'get_clusters_by_gap_statistic', 'silhouette': 'get_clusters_by_silhouette_score'}, 'metrics': {'ARS': 'max', 'BHI': 'min', 'BI': 'min', 'BRI': 'min', 'CDS': 'max', 'CHI': 'max', 'CS': 'max', 'DBCVI': 'min', 'DBI': 'min', 'DHI': 'min', 'DI': 'max', 'DRI': 'max', 'ES': 'min', 'FMS': 'max', 'FmS': 'max', 'GAS': 'max', 'GPS': 'min', 'HGS': 'max', 'HI': 'min', 'HS': 'max', 'JS': 'max', 'KDI': 'max', 'KS': 'max', 'LDRI': 'max', 'LSRI': 'max', 'MIS': 'max', 'MNS': 'max', 'MSEI': 'min', 'NMIS': 'max', 'PhS': 'max', 'PrS': 'max', 'PuS': 'max', 'RRS': 'max', 'RSI': 'max', 'RTS': 'max', 'RaS': 'max', 'ReS': 'max', 'SI': 'max', 'SS1S': 'max', 'SS2S': 'max', 'SSEI': 'min', 'TS': 'max', 'VMS': 'max', 'XBI': 'min'}, 'obj': {'ARS': 'max', 'BHI': 'min', 'BI': 'min', 'BRI': 'min', 'CDS': 'max', 'CHI': 'max', 'CS': 'max', 'DBCVI': 'min', 'DBI': 'min', 'DHI': 'min', 'DI': 'max', 'DRI': 'max', 'ES': 'min', 'FMS': 'max', 'FmS': 'max', 'GAS': 'max', 'GPS': 'min', 'HGS': 'max', 'HI': 'min', 'HS': 'max', 'JS': 'max', 'KDI': 'max', 'KS': 'max', 'LDRI': 'max', 'LSRI': 'max', 'MIS': 'max', 'MNS': 'max', 'MSEI': 'min', 'NMIS': 'max', 'PhS': 'max', 'PrS': 'max', 'PuS': 'max', 'RRS': 'max', 'RSI': 'max', 'RTS': 'max', 'RaS': 'max', 'ReS': 'max', 'SI': 'max', 'SS1S': 'max', 'SS2S': 'max', 'SSEI': 'min', 'TS': 'max', 'VMS': 'max', 'XBI': 'min'}, 'optimizer': ['OriginalABC', 'OriginalACOR', 'AugmentedAEO', 'EnhancedAEO', 'ImprovedAEO', 'ModifiedAEO', 'OriginalAEO', 'OriginalAFT', 'MGTO', 'OriginalAGTO', 'DevALO', 'OriginalALO', 'AAO', 'OriginalAO', 'OriginalAOA', 'IARO', 'LARO', 'OriginalARO', 'OriginalASO', 'OriginalAVOA', 'OriginalArchOA', 'AdaptiveBA', 'DevBA', 'OriginalBA', 'DevBBO', 'OriginalBBO', 'OriginalBBOA', 'OriginalBCO', 'OriginalBES', 'ABFO', 'OriginalBFO', 'OriginalBMO', 'DevBRO', 'OriginalBRO', 'OriginalBSA', 'ImprovedBSO', 'OriginalBSO', 'CleverBookBeesA', 'OriginalBeesA', 'ProbBeesA', 'OriginalCA', 'OriginalCDDO', 'OriginalCDO', 'OriginalCEM', 'OriginalCGO', 'DevCHIO', 'OriginalCHIO', 'OriginalCOA', 'OCRO', 'OriginalCRO', 'OriginalCSA', 'OriginalCSO', 'OriginalCircleSA', 'OriginalCoatiOA', 'JADE', 'OriginalDE', 'SADE', 'SAP_DE', 'DevDMOA', 'OriginalDMOA', 'OriginalDO', 'DevEFO', 'OriginalEFO', 'OriginalEHO', 'AdaptiveEO', 'ModifiedEO', 'OriginalEO', 'OriginalEOA', 'LevyEP', 'OriginalEP', 'DevEPC', 'CMA_ES', 'LevyES', 'OriginalES', 'Simple_CMA_ES', 'OriginalESO', 'OriginalESOA', 'OriginalEVO', 'OriginalFA', 'DevFBIO', 'OriginalFBIO', 'OriginalFDO', 'OriginalFFA', 'OriginalFFO', 'OriginalFLA', 'DevFOA', 'OriginalFOA', 'WhaleFOA', 'DevFOX', 'OriginalFOX', 'OriginalFPA', 'BaseGA', 'EliteMultiGA', 'EliteSingleGA', 'MultiGA', 'OriginalGA', 'SingleGA', 'OriginalGBO', 'DevGCO', 'OriginalGCO', 'OriginalGJO', 'OriginalGOA', 'DevGSKA', 'OriginalGSKA', 'Matlab101GTO', 'Matlab102GTO', 'OriginalGTO', 'CG_GWO', 'ChaoticGWO', 'DS_GWO', 'ER_GWO', 'ExGWO', 'FuzzyGWO', 'GWO_WOA', 'IGWO', 'IOBL_GWO', 'IncrementalGWO', 'OGWO', 'OriginalGWO', 'RW_GWO', 'OriginalHBA', 'OriginalHBO', 'OriginalHC', 'SwarmHC', 'OriginalHCO', 'OriginalHGS', 'OriginalHGSO', 'OriginalHHO', 'DevHS', 'OriginalHS', 'OriginalICA', 'OriginalIMODE', 'OriginalINFO', 'OriginalIWO', 'DevJA', 'LevyJA', 'OriginalJA', 'DevLCO', 'ImprovedLCO', 'OriginalLCO', 'OriginalLSHADEcnEpSin', 'OriginalMA', 'OriginalMFO', 'OriginalMGO', 'OriginalMPA', 'OriginalMRFO', 'WMQIMRFO', 'OriginalMSA', 'DevMVO', 'OriginalMVO', 'OriginalNGO', 'ImprovedNMRA', 'OriginalNMRA', 'OriginalNRO', 'OriginalOOA', 'OriginalPFA', 'OriginalPOA', 'AIW_PSO', 'CL_PSO', 'C_PSO', 'HPSO_TVAC', 'LDW_PSO', 'OriginalPSO', 'P_PSO', 'OriginalPSS', 'DevQSA', 'ImprovedQSA', 'LevyQSA', 'OppoQSA', 'OriginalQSA', 'OriginalRIME', 'OriginalRUN', 'GaussianSA', 'OriginalSA', 'SwarmSA', 'DevSARO', 'OriginalSARO', 'DevSBO', 'OriginalSBO', 'DevSCA', 'OriginalSCA', 'QleSCA', 'OriginalSCSO', 'ImprovedSFO', 'OriginalSFO', 'L_SHADE', 'OriginalSHADE', 'OriginalSHIO', 'OriginalSHO', 'ImprovedSLO', 'ModifiedSLO', 'OriginalSLO', 'DevSMA', 'OriginalSMA', 'DevSMO', 'DevSOA', 'OriginalSOA', 'OriginalSOS', 'DevSPBO', 'OriginalSPBO', 'OriginalSRSR', 'DevSSA', 'OriginalSSA', 'OriginalSSDO', 'OriginalSSO', 'OriginalSSpiderA', 'OriginalSSpiderO', 'OriginalSTO', 'OriginalSeaHO', 'OriginalServalOA', 'OriginalSquirrelSA', 'OriginalTDO', 'DevTLO', 'ImprovedTLO', 'OriginalTLO', 'OriginalTOA', 'DevTPO', 'OriginalTS', 'OriginalTSA', 'OriginalTSO', 'EnhancedTWO', 'LevyTWO', 'OppoTWO', 'OriginalTWO', 'DevVCS', 'OriginalVCS', 'OriginalWCA', 'OriginalWDO', 'OriginalWHO', 'HI_WOA', 'OriginalWOA', 'OriginalWaOA', 'OriginalWarSO', 'OriginalZOA']}
execute(data=None, cluster_finder='elbow', list_metric=None, save_path='history', verbose=True, mode='single', n_workers=None, termination=None)[source]
Parameters
  • data (instance of Data class, default=None) – The instance of Data class, make sure you have at least matrix feature X. The target labels y (Optional). Also make sure your matrix X is normalized or standardized

  • cluster_finder (str, default="elbow".) –

    The method to find the optimal number of clusters in data. The supported methods are: [“elbow”, “gap”, “silhouette”, “davies_bouldin”, “calinski_harabasz”, “bayesian_ìnormation”, “all_min”, “all_max”, “all_mean”, “all_majority”]. The method has prefixes all means that it will try all other methods and get the statistical number of clusters. For example, all_min, takes the minimum K found from all tried methods. all_mean, takes the average K found from all tried methods.

    This parameter is only used when data.y is None. If you pass labels y to data. This method will be turned off. The number of clusters will be determined by number of unique labels in y.

  • list_metric (list, default=None) – List of performance metrics that supported by the library: https://github.com/thieu1995/permetrics To get the supported metrics, please use: MetaCluster.get_support(), supported obj are supported metrics

  • save_path (str, default="history") – The path to the folder that hold results

  • verbose (int, default = True) – Controls verbosity of output for each training process of each optimizer.

  • mode (str, default = 'single') –

    The mode used in Optimizer belongs to Mealpy library. Parallel: ‘process’, ‘thread’; Sequential: ‘swarm’, ‘single’.

    • ’process’: The parallel mode with multiple cores run the tasks

    • ’thread’: The parallel mode with multiple threads run the tasks

    • ’swarm’: The sequential mode that no effect on updating phase of other agents

    • ’single’: The sequential mode that effect on updating phase of other agents, default

  • n_workers (int or None, default = None) – The number of workers (cores or threads) used in Optimizer (effect only on parallel mode)

  • termination (dict or None, default = None) – The termination dictionary or an instance of Termination class. It is for Optimizer belongs to Mealpy library.

static get_support(name='all', verbose=True)[source]
save_boxplots(figure_size=None, xlabel='Optimizer', list_ylabel=None, title='Boxplot of comparison models', show_legend=True, show_mean_only=False, exts=('.png', '.pdf'), file_name='boxplot')[source]

All boxplots figures will be saved in the same folder of: {save_path}/{dataset_name}/

Parameters
  • figure_size (list, tuple, np.ndarray, None, default=None) – The size for saved figures. None means it will automatically set for you. Or you can pass (width, height) of figure based on pixel (100px to 1500px)

  • xlabel (str, default="Optimizer") – The label for x coordinate of boxplot figures.

  • list_ylabel (list, tuple, np.ndarray, None, default=None) – The label for y coordinate of boxplot figures. Each boxplot corresponding to each metric in list_metric parameter, therefor, if you wish to change to y label, you need to pass a list of string represent all metrics in order of list_metric. None means it will use the name of metrics as the label

  • title (str, default="Boxplot of comparison models") – The title of figures, it should be the same for all objectives since we have y coordinate already difference.

  • show_legend (bool, default=True) – Show the legend or not. For boxplots we can turn on or off this option, but not for convergence chart.

  • show_mean_only (bool, default=False) – You can show the mean value only or you can show all mean, std, median of the box by this parameter

  • exts (list, tuple, np.ndarray, default=(".png", ".pdf")) – List of extensions of the figures. It is for multiple purposes such as latex (need “.pdf” format), word (need “.png” format).

  • file_name (str, default="boxplot") – The prefix for filenames that will be saved.

save_convergences(figure_size=None, xlabel='Epoch', list_ylabel=None, title='Convergence chart of comparison models', exts=('.png', '.pdf'), file_name='convergence')[source]

All convergence figures will be saved in the same folder of: {save_path}/{dataset_name}/

Parameters
  • figure_size (list, tuple, np.ndarray, None, default=None) – The size for saved figures. None means it will automatically set for you. Or you can pass (width, height) of figure based on pixel (100px to 1500px)

  • xlabel (str, default="Optimizer") – The label for x coordinate of convergence figures.

  • list_ylabel (list, tuple, np.ndarray, None, default=None) – The label for y coordinate of convergence figures. Each convergence corresponding to each objective in list_obj, therefor, if you wish to change to y label, you need to pass a list of string represent all objectives in order of list_obj. None means it will use the name of objectives as the label

  • title (str, default="Convergence chart of comparison models") – The title of figures, it should be the same for all objectives since we have y coordinate already difference.

  • exts (list, tuple, np.ndarray, default=(".png", ".pdf")) – List of extensions of the figures. It is for multiple purposes such as latex (need “.pdf” format), word (need “.png” format).

  • file_name (str, default="convergence") – The prefix for filenames that will be saved.