gemseo / mlearning / quality_measures

Show inherited members

silhouette_measure module

The silhouette coefficient to assess a clustering.

The silhouette coefficient \(s_i\) is a measure of how similar a point \(x_i\) is to its own cluster \(C_{k_i}\) (cohesion) compared to other clusters (separation):

\[s_i = \frac{b_i-a_i}{\max(a_i,b_i)}\]

with \(a_i=\frac{1}{|C_{k_i}|-1} \sum_{j\in C_{k_i}\setminus\{i\} } \|x_i-x_j\|\) and \(b_i = \underset{\ell=1,\cdots,K\atop{\ell\neq k_i}}{\min} \frac{1}{|C_\ell|} \sum_{j\in C_\ell} \|x_i-x_j\|\)

where

  • \(K\) is the number of clusters,

  • \(C_k\) are the indices of the points belonging to the cluster \(k\),

  • \(|C_k|\) is the size of \(C_k\).

class gemseo.mlearning.quality_measures.silhouette_measure.SilhouetteMeasure(algo, fit_transformers=True)[source]

Bases: MLPredictiveClusteringMeasure

The silhouette coefficient to assess a clustering.

Parameters:
  • algo (BaseMLPredictiveClusteringAlgo) – A clustering algorithm.

  • fit_transformers (bool) –

    Whether to re-fit the transformers when using resampling techniques. If False, use the transformers of the algorithm fitted from the whole learning dataset.

    By default it is set to True.

compute_bootstrap_measure(n_replicates=100, samples=None, multioutput=True, seed=None)[source]

Evaluate the quality measure using the bootstrap technique.

Parameters:
  • n_replicates (int) –

    The number of bootstrap replicates.

    By default it is set to 100.

  • samples (Sequence[int] | None) – The indices of the learning samples. If None, use the whole learning dataset.

  • multioutput (bool) –

    Whether the quality measure is returned for each component of the outputs. Otherwise, the average quality measure.

    By default it is set to True.

  • seed (int | None) – The seed of the pseudo-random number generator. If None, an unpredictable generator will be used.

Returns:

The value of the quality measure.

Return type:

MeasureType

compute_cross_validation_measure(n_folds=5, samples=None, multioutput=True, randomize=True, seed=None)[source]

Evaluate the quality measure using the k-folds technique.

Parameters:
  • n_folds (int) –

    The number of folds.

    By default it is set to 5.

  • samples (Sequence[int] | None) – The indices of the learning samples. If None, use the whole learning dataset.

  • multioutput (bool) –

    Whether the quality measure is returned for each component of the outputs. Otherwise, the average quality measure.

    By default it is set to True.

  • randomize (bool) –

    Whether to shuffle the samples before dividing them in folds.

    By default it is set to True.

  • seed (int | None) – The seed of the pseudo-random number generator. If None, an unpredictable generator is used.

Returns:

The value of the quality measure.

Return type:

MeasureType

compute_test_measure(test_data, samples=None, multioutput=True)[source]

Evaluate the quality measure using a test dataset.

Parameters:
  • test_data (Dataset) – The test dataset.

  • samples (Sequence[int] | None) – The indices of the learning samples. If None, use the whole learning dataset.

  • multioutput (bool) –

    Whether the quality measure is returned for each component of the outputs. Otherwise, the average quality measure.

    By default it is set to True.

Returns:

The value of the quality measure.

Return type:

MeasureType

SMALLER_IS_BETTER: ClassVar[bool] = False

Whether to minimize or maximize the measure.

algo: BaseMLPredictiveClusteringAlgo

The machine learning algorithm whose quality we want to measure.

Examples using SilhouetteMeasure

Advanced mixture of experts

Advanced mixture of experts