gemseo.mlearning.clustering.algos.gaussian_mixture module#

The Gaussian mixture algorithm for clustering.

The Gaussian mixture algorithm groups the data into clusters. The number of clusters is fixed. Each cluster \(i=1, \\cdots, k\) is defined by a mean \(\\mu_i\) and a covariance matrix \(\\Sigma_i\).

The prediction of the cluster value of a point is simply the cluster where the probability density of the Gaussian distribution defined by the given mean and covariance matrix is the highest:

\[\begin{split}\\operatorname{cluster}(x) = \\underset{i=1,\\cdots,k}{\\operatorname{argmax}} \\ \\mathcal{N}(x; \\mu_i, \\Sigma_i)\end{split}\]

where \(\\mathcal{N}(x; \\mu_i, \\Sigma_i)\) is the value of the probability density function of a Gaussian random variable \(X \\sim \\mathcal{N}(\\mu_i, \\Sigma_i)\) at the point \(x\) and \(\\|x-\\mu_i\\|_{\\Sigma_i^{-1}} = \\sqrt{(x-\\mu_i)^T \\Sigma_i^{-1} (x-\\mu_i)}\) is the Mahalanobis distance between \(x\) and \(\\mu_i\) weighted by \(\\Sigma_i\). Likewise, the probability of belonging to a cluster \(i=1, \\cdots, k\) may be determined through

\[\begin{split}\\mathbb{P}(x \\in C_i) = \\frac{\\mathcal{N}(x; \\mu_i, \\Sigma_i)} {\\sum_{j=1}^k \\mathcal{N}(x; \\mu_j, \\Sigma_j)},\end{split}\]

where \(C_i = \\{x\\, | \\, \\operatorname{cluster}(x) = i \\}\).

When fitting the algorithm, the cluster centers \(\\mu_i\) and the covariance matrices \(\\Sigma_i\) are computed using the expectation-maximization algorithm.

This concept is implemented through the GaussianMixture class which inherits from the BaseClusterer class.

Dependence#

This clustering algorithm relies on the GaussianMixture class of the scikit-learn library.

class GaussianMixture(data, settings_model=None, **settings)[source]#

Bases: BasePredictiveClusterer

The Gaussian mixture clustering algorithm.

Parameters:
  • data (Dataset) -- The learning dataset.

  • settings_model (BaseMLAlgoSettings | None) -- The machine learning algorithm settings as a Pydantic model. If None, use **settings.

  • **settings (Any) -- The machine learning algorithm settings. These arguments are ignored when settings_model is not None.

Raises:

ValueError -- When both the variable and the group it belongs to have a transformer.

Settings#

alias of GaussianMixture_Settings

LIBRARY: ClassVar[str] = 'scikit-learn'#

The name of the library of the wrapped machine learning algorithm.

SHORT_ALGO_NAME: ClassVar[str] = 'GMM'#

The short name of the machine learning algorithm, often an acronym.

Typically used for composite names, e.g. f"{algo.SHORT_ALGO_NAME}_{dataset.name}" or f"{algo.SHORT_ALGO_NAME}_{discipline.name}".