gemseo.mlearning.clustering.algos.gaussian_mixture module#
The Gaussian mixture algorithm for clustering.
The Gaussian mixture algorithm groups the data into clusters. The number of clusters is fixed. Each cluster \(i=1, \cdots, k\) is defined by a mean \(\mu_i\) and a covariance matrix \(\Sigma_i\).
The prediction of the cluster value of a point is simply the cluster where the probability density of the Gaussian distribution defined by the given mean and covariance matrix is the highest:
where \(\mathcal{N}(x; \mu_i, \Sigma_i)\) is the value of the probability density function of a Gaussian random variable \(X \sim \mathcal{N}(\mu_i, \Sigma_i)\) at the point \(x\) and \(\|x-\mu_i\|_{\Sigma_i^{-1}} = \sqrt{(x-\mu_i)^T \Sigma_i^{-1} (x-\mu_i)}\) is the Mahalanobis distance between \(x\) and \(\mu_i\) weighted by \(\Sigma_i\). Likewise, the probability of belonging to a cluster \(i=1, \cdots, k\) may be determined through
where \(C_i = \{x\, | \, \operatorname{cluster}(x) = i \}\).
When fitting the algorithm, the cluster centers \(\mu_i\) and the covariance matrices \(\Sigma_i\) are computed using the expectation-maximization algorithm.
This concept is implemented through the GaussianMixture class
which inherits from the BaseClusterer class.
Dependence#
This clustering algorithm relies on the GaussianMixture class of the scikit-learn library.
- class GaussianMixture(data, settings_model=None, **settings)[source]#
Bases:
BasePredictiveClustererThe Gaussian mixture clustering algorithm.
- Parameters:
data (Dataset) -- The training dataset.
settings_model (BaseMLAlgoSettings | None) -- The machine learning algorithm settings as a Pydantic model. If
None, use**settings.**settings (Any) -- The machine learning algorithm settings. These arguments are ignored when
settings_modelis notNone.
- Raises:
ValueError -- When both the variable and the group it belongs to have a transformer.
- Settings#
alias of
GaussianMixture_Settings