gaussian_mixture module¶
Gaussian mixture clustering algorithm¶
The Gaussian mixture algorithm groups the data into clusters. The number of clusters is fixed. Each cluster \(i=1, \cdots, k\) is defined by a mean \(\mu_i\) and a covariance matrix \(\Sigma_i\).
The prediction of the cluster value of a point is simply the cluster where the probability density from the Gaussian distribution defined by the given mean and covariance matrix is the highest:
where \(\mathcal{N}(x; \mu_i, \Sigma_i)\) is the value of the probability density function of a Gaussian random variable \(X \sim \mathcal{N}(\mu_i, \Sigma_i)\) at the point \(x\) and \(\|x-\mu_i\|_{\Sigma_i^{-1}} = \sqrt{(x-\mu_i)^T \Sigma_i^{-1} (x-\mu_i)}\) is the Mahalanobis distance between \(x\) and \(\mu_i\) weighted by \(\Sigma_i\). Likewise, the probability of belonging to a cluster \(i=1, \cdots, k\) may be determined through
where \(C_i = \{x\, | \, \operatorname{cluster}(x) = i \}\).
When fitting the algorithm, the cluster centers \(\mu_i\) and the covariance matrices \(\Sigma_i\) are computed using the expectation-maximization algorithm.
This concept is implemented through the GaussianMixture
class which inherits from the MLClusteringAlgo
class.
Dependence¶
This clustering algorithm relies on the GaussianMixture class of the scikit-learn library.
-
class
gemseo.mlearning.cluster.gaussian_mixture.
GaussianMixture
(data, transformer=None, var_names=None, n_components=5, **parameters)[source]¶ Bases:
gemseo.mlearning.cluster.cluster.MLClusteringAlgo
Gaussian mixture clustering algorithm.
Constructor.
- Parameters
data (Dataset) – learning dataset.
transformer (dict(str)) – transformation strategy for data groups. If None, do not transform data. Default: None.
var_names (list(str)) – names of the variables to consider.
n_components (int) – number of Gaussian mixture components. Default: 5.
parameters – Scikit-learn algorithm parameters.
-
ABBR
= 'GaussMix'¶