gemseo_mlearning / quality_measures

me_measure module

The maximum error measure to measure the quality of a regression algorithm.

The maximum error (ME) is defined by

\[\operatorname{ME}(\hat{y})=\max_{1\leq i \leq n}\|\hat{y}_i-y_i\|,\]

where \(\hat{y}\) are the predictions and \(y\) are the data points.

class gemseo_mlearning.quality_measures.me_measure.MEMeasure(algo, fit_transformers=False)[source]

Bases: MLErrorMeasure

The maximum error measure for machine learning.

Parameters:
  • algo (MLRegressionAlgo) – A machine learning algorithm for supervised learning.

  • fit_transformers (bool) –

    Whether to re-fit the transformers when using resampling techniques. If False, use the transformers of the algorithm fitted from the whole learning dataset.

    By default it is set to False.

evaluate(method='learn', samples=None, **options)

Evaluate the quality measure.

Parameters:
  • method (str) –

    The name of the method to evaluate the quality measure.

    By default it is set to “learn”.

  • samples (Sequence[int] | None) – The indices of the learning samples. If None, use the whole learning dataset.

  • **options (OptionType | None) – The options of the estimation method (e.g. test_data for the test method, n_replicates for the bootstrap one, …).

Returns:

The value of the quality measure.

Raises:

ValueError – When the name of the method is unknown.

Return type:

float | ndarray

evaluate_bootstrap(n_replicates=100, samples=None, multioutput=True, seed=None)

Evaluate the quality measure using the bootstrap technique.

Parameters:
  • n_replicates (int) –

    The number of bootstrap replicates.

    By default it is set to 100.

  • samples (Sequence[int] | None) – The indices of the learning samples. If None, use the whole learning dataset.

  • multioutput (bool) –

    If True, return the quality measure for each output component. Otherwise, average these measures.

    By default it is set to True.

  • seed (None | None) – The seed of the pseudo-random number generator. If None, then an unpredictable generator will be used.

Returns:

The value of the quality measure.

Return type:

float | ndarray

evaluate_kfolds(n_folds=5, samples=None, multioutput=True, randomize=False, seed=None)

Evaluate the quality measure using the k-folds technique.

Parameters:
  • n_folds (int) –

    The number of folds.

    By default it is set to 5.

  • samples (Sequence[int] | None) – The indices of the learning samples. If None, use the whole learning dataset.

  • multioutput (bool) –

    If True, return the quality measure for each output component. Otherwise, average these measures.

    By default it is set to True.

  • randomize (bool) –

    Whether to shuffle the samples before dividing them in folds.

    By default it is set to False.

  • seed (int | None) – The seed of the pseudo-random number generator. If None, then an unpredictable generator will be used.

Returns:

The value of the quality measure.

Return type:

float | ndarray

evaluate_learn(samples=None, multioutput=True)

Evaluate the quality measure from the learning dataset.

Parameters:
  • samples (Sequence[int] | None) – The indices of the learning samples. If None, use the whole learning dataset.

  • multioutput (bool) –

    Whether to return the quality measure for each output component. If not, average these measures.

    By default it is set to True.

Returns:

The value of the quality measure.

Return type:

float | ndarray

evaluate_loo(samples=None, multioutput=True)

Evaluate the quality measure using the leave-one-out technique.

Parameters:
  • samples (Sequence[int] | None) – The indices of the learning samples. If None, use the whole learning dataset.

  • multioutput (bool) –

    If True, return the quality measure for each output component. Otherwise, average these measures.

    By default it is set to True.

Returns:

The value of the quality measure.

Return type:

float | ndarray

evaluate_test(test_data, samples=None, multioutput=True)

Evaluate the quality measure using a test dataset.

Parameters:
  • test_data (Dataset) – The test dataset.

  • samples (Sequence[int] | None) – The indices of the learning samples. If None, use the whole learning dataset.

  • multioutput (bool) –

    If True, return the quality measure for each output component. Otherwise, average these measures.

    By default it is set to True.

Returns:

The value of the quality measure.

Return type:

float | ndarray

classmethod is_better(val1, val2)

Compare the quality between two values.

This methods returns True if the first one is better than the second one.

For most measures, a smaller value is “better” than a larger one (MSE etc.). But for some, like an R2-measure, higher values are better than smaller ones. This comparison method correctly handles this, regardless of the type of measure.

Parameters:
  • val1 (float) – The value of the first quality measure.

  • val2 (float) – The value of the second quality measure.

Returns:

Whether val1 is of better quality than val2.

Return type:

bool

BOOTSTRAP: ClassVar[str] = 'bootstrap'

The name of the method to evaluate the measure by bootstrap.

KFOLDS: ClassVar[str] = 'kfolds'

The name of the method to evaluate the measure by cross-validation.

LEARN: ClassVar[str] = 'learn'

The name of the method to evaluate the measure on the learning dataset.

LOO: ClassVar[str] = 'loo'

The name of the method to evaluate the measure by leave-one-out.

SMALLER_IS_BETTER: ClassVar[bool] = True

Whether to minimize or maximize the measure.

TEST: ClassVar[str] = 'test'

The name of the method to evaluate the measure on a test dataset.

algo: MLAlgo

The machine learning algorithm usually trained.