gemseo.mlearning.clustering.algos.gaussian_mixture module#
The Gaussian mixture algorithm for clustering.
The Gaussian mixture algorithm groups the data into clusters. The number of clusters is fixed. Each cluster \(i=1, \\cdots, k\) is defined by a mean \(\\mu_i\) and a covariance matrix \(\\Sigma_i\).
The prediction of the cluster value of a point is simply the cluster where the probability density of the Gaussian distribution defined by the given mean and covariance matrix is the highest:
where \(\\mathcal{N}(x; \\mu_i, \\Sigma_i)\) is the value of the probability density function of a Gaussian random variable \(X \\sim \\mathcal{N}(\\mu_i, \\Sigma_i)\) at the point \(x\) and \(\\|x-\\mu_i\\|_{\\Sigma_i^{-1}} = \\sqrt{(x-\\mu_i)^T \\Sigma_i^{-1} (x-\\mu_i)}\) is the Mahalanobis distance between \(x\) and \(\\mu_i\) weighted by \(\\Sigma_i\). Likewise, the probability of belonging to a cluster \(i=1, \\cdots, k\) may be determined through
where \(C_i = \\{x\\, | \\, \\operatorname{cluster}(x) = i \\}\).
When fitting the algorithm, the cluster centers \(\\mu_i\) and the covariance matrices \(\\Sigma_i\) are computed using the expectation-maximization algorithm.
This concept is implemented through the GaussianMixture class
which inherits from the BaseClusterer class.
Dependence#
This clustering algorithm relies on the GaussianMixture class of the scikit-learn library.
- class GaussianMixture(data, settings_model=None, **settings)[source]#
Bases:
BasePredictiveClustererThe Gaussian mixture clustering algorithm.
- Parameters:
data (Dataset) -- The learning dataset.
settings_model (BaseMLAlgoSettings | None) -- The machine learning algorithm settings as a Pydantic model. If
None, use**settings.**settings (Any) -- The machine learning algorithm settings. These arguments are ignored when
settings_modelis notNone.
- Raises:
ValueError -- When both the variable and the group it belongs to have a transformer.
- Settings#
alias of
GaussianMixture_Settings