silhouette module¶

The silhouette coefficient to assess a clustering.

The silhouette coefficient \(s_i\) is a measure of how similar a point \(x_i\) is to its own cluster \(C_{k_i}\) (cohesion) compared to other clusters (separation):

\[s_i = \frac{b_i-a_i}{\max(a_i,b_i)}\]

with \(a_i=\frac{1}{|C_{k_i}|-1} \sum_{j\in C_{k_i}\setminus\{i\} } \|x_i-x_j\|\) and \(b_i = \underset{\ell=1,\cdots,K\atop{\ell\neq k_i}}{\min} \frac{1}{|C_\ell|} \sum_{j\in C_\ell} \|x_i-x_j\|\)

where

\(K\) is the number of clusters,
\(C_k\) are the indices of the points belonging to the cluster \(k\),
\(|C_k|\) is the size of \(C_k\).

class gemseo.mlearning.qual_measure.silhouette.SilhouetteMeasure(algo, fit_transformers=True)[source]¶

Bases: MLPredictiveClusteringMeasure

The silhouette coefficient to assess a clustering.

Parameters:

algo (MLPredictiveClusteringAlgo) – A clustering algorithm.
fit_transformers (bool) –
Whether to re-fit the transformers when using resampling techniques. If False, use the transformers of the algorithm fitted from the whole learning dataset.

By default it is set to True.

evaluate(method='learn', samples=None, multioutput=True, **options)¶

Evaluate the quality measure.

Parameters:

method (str) –
The name of the method to evaluate the quality measure.

By default it is set to “learn”.
samples (Sequence[int] | None) – The indices of the learning samples. If None, use the whole learning dataset.
multioutput (bool) –
If True, return the quality measure for each output component. Otherwise, average these measures.

By default it is set to True.
**options (OptionType | None) – The options of the estimation method (e.g. test_data for the test method, n_replicates for the bootstrap one, …).

Returns:

The value of the quality measure.

Raises:

ValueError – When the name of the method is unknown.

Return type:

MeasureType

evaluate_bootstrap(n_replicates=100, samples=None, multioutput=True, seed=None)[source]¶

Evaluate the quality measure using the bootstrap technique.

Parameters:

n_replicates (int) –
The number of bootstrap replicates.

By default it is set to 100.
samples (Sequence[int] | None) – The indices of the learning samples. If None, use the whole learning dataset.
multioutput (bool) –
If True, return the quality measure for each output component. Otherwise, average these measures.

By default it is set to True.
seed (int | None) – The seed of the pseudo-random number generator. If None, then an unpredictable generator will be used.

Returns:

The value of the quality measure.

Return type:

MeasureType

evaluate_kfolds(n_folds=5, samples=None, multioutput=True, randomize=True, seed=None)[source]¶

Evaluate the quality measure using the k-folds technique.

Parameters:

n_folds (int) –
The number of folds.

By default it is set to 5.
samples (Sequence[int] | None) – The indices of the learning samples. If None, use the whole learning dataset.
multioutput (bool) –
If True, return the quality measure for each output component. Otherwise, average these measures.

By default it is set to True.
randomize (bool) –
Whether to shuffle the samples before dividing them in folds.

By default it is set to True.
seed (int | None) – The seed of the pseudo-random number generator. If None, then an unpredictable generator will be used.

Returns:

The value of the quality measure.

Return type:

MeasureType

evaluate_learn(samples=None, multioutput=True)¶

Evaluate the quality measure from the learning dataset.

Parameters:

samples (Sequence[int] | None) – The indices of the learning samples. If None, use the whole learning dataset.
multioutput (bool) –
If True, return the quality measure for each output component. Otherwise, average these measures.

By default it is set to True.

Returns:

The value of the quality measure.

Return type:

MeasureType

evaluate_loo(samples=None, multioutput=True)¶

Evaluate the quality measure using the leave-one-out technique.

Parameters:

samples (Sequence[int] | None) – The indices of the learning samples. If None, use the whole learning dataset.
multioutput (bool) –
If True, return the quality measure for each output component. Otherwise, average these measures.

By default it is set to True.

Returns:

The value of the quality measure.

Return type:

MeasureType

evaluate_test(test_data, samples=None, multioutput=True)[source]¶

Evaluate the quality measure using a test dataset.

Parameters:

test_data (Dataset) – The test dataset.
samples (Sequence[int] | None) – The indices of the learning samples. If None, use the whole learning dataset.
multioutput (bool) –
If True, return the quality measure for each output component. Otherwise, average these measures.

By default it is set to True.

Returns:

The value of the quality measure.

Return type:

MeasureType

classmethod is_better(val1, val2)¶

Compare the quality between two values.

This method returns True if the first one is better than the second one.

For most measures, a smaller value is “better” than a larger one (MSE etc.). But for some, like an R2-measure, higher values are better than smaller ones. This comparison method correctly handles this, regardless of the type of measure.

Parameters:

val1 (float) – The value of the first quality measure.
val2 (float) – The value of the second quality measure.

Returns:

Whether val1 is of better quality than val2.

Return type:

bool

BOOTSTRAP: ClassVar[str] = 'bootstrap'¶: The name of the method to evaluate the measure by bootstrap.

KFOLDS: ClassVar[str] = 'kfolds'¶: The name of the method to evaluate the measure by cross-validation.

LEARN: ClassVar[str] = 'learn'¶: The name of the method to evaluate the measure on the learning dataset.

LOO: ClassVar[str] = 'loo'¶: The name of the method to evaluate the measure by leave-one-out.

SMALLER_IS_BETTER: ClassVar[bool] = False¶: Whether to minimize or maximize the measure.

TEST: ClassVar[str] = 'test'¶: The name of the method to evaluate the measure on a test dataset.

algo: MLAlgo¶: The machine learning algorithm usually trained.

Examples using SilhouetteMeasure¶

Advanced mixture of experts