gemseo / mlearning / cluster

cluster module

Clustering algorithm

The cluster module implements the concept of clustering models, a kind of unsupervised machine learning algorithm where the goal is to group data into clusters. Wherever it is possible, these methods should be able to predict the class of new data, as well as the probability of belonging to each class.

This concept is implemented through the MLClusteringAlgo class which inherits from the MLUnsupervisedAlgo class.

class gemseo.mlearning.cluster.cluster.MLClusteringAlgo(data, transformer=None, var_names=None, **parameters)[source]

Bases: gemseo.mlearning.core.unsupervised.MLUnsupervisedAlgo

Clustering algorithm.

Inheriting class should overload the MLUnsupervisedAlgo._fit() method, and the MLClusteringAlgo._predict() and MLClusteringAlgo._predict_proba() methods if possible.


  • data (Dataset) – learning dataset.

  • transformer (dict(str)) – transformation strategy for data groups. If None, do not scale data. Default: None.

  • var_names (list(str)) – names of the variables to consider.

  • parameters – algorithm parameters.


Overriding learn function for assuring that labels are defined. Identify number of clusters.


Predict cluster of data.


data (dict(ndarray) or ndarray) – data (1D or 2D).


clusters of data (“0D” or 1D).

Return type

int or ndarray(int)

predict_proba(data, hard=True)[source]

Predict probability of belonging to each cluster.

  • data (dict(ndarray) or ndarray) – data (1D or 2D).

  • hard (bool) – indicator for hard or soft clustering. Default: True.


probabilities of belonging to each cluster (1D or 2D, same as data).

Return type