cluster module¶

Clustering algorithm¶

The cluster module implements the concept of clustering models, a kind of unsupervised machine learning algorithm where the goal is to group data into clusters. Wherever it is possible, these methods should be able to predict the class of new data, as well as the probability of belonging to each class.

This concept is implemented through the MLClusteringAlgo class which inherits from the MLUnsupervisedAlgo class.

class gemseo.mlearning.cluster.cluster.MLClusteringAlgo(data, transformer=None, var_names=None, **parameters)[source]¶

Bases: gemseo.mlearning.core.unsupervised.MLUnsupervisedAlgo

Clustering algorithm.

Inheriting class should overload the MLUnsupervisedAlgo._fit() method, and the MLClusteringAlgo._predict() and MLClusteringAlgo._predict_proba() methods if possible.

Constructor.

Parameters

data (Dataset) – learning dataset.
transformer (dict(str)) – transformation strategy for data groups. If None, do not scale data. Default: None.
var_names (list(str)) – names of the variables to consider.
parameters – algorithm parameters.

learn(samples=None)[source]¶: Overriding learn function for assuring that labels are defined. Identify number of clusters.

predict(data)[source]¶

Predict cluster of data.

Parameters: data (dict(ndarray) or ndarray) – data (1D or 2D).
Returns: clusters of data (“0D” or 1D).
Return type: int or ndarray(int)

predict_proba(data, hard=True)[source]¶

Predict probability of belonging to each cluster.

Parameters

data (dict(ndarray) or ndarray) – data (1D or 2D).
hard (bool) – indicator for hard or soft clustering. Default: True.

Returns

probabilities of belonging to each cluster (1D or 2D, same as data).

Return type

ndarray