gemseo / mlearning / qual_measure

mse_measure module

The mean squared error to measure the quality of a regression algorithm.

The mse_measure module implements the concept of mean squared error measures for machine learning algorithms.

This concept is implemented through the MSEMeasure class and overloads the MLErrorMeasure._compute_measure() method.

The mean squared error (MSE) is defined by

\[\operatorname{MSE}(\hat{y})=\frac{1}{n}\sum_{i=1}^n(\hat{y}_i-y_i)^2,\]

where \(\hat{y}\) are the predictions and \(y\) are the data points.

Classes:

MSEMeasure(algo)

The Mean Squared Error measure for machine learning.

class gemseo.mlearning.qual_measure.mse_measure.MSEMeasure(algo)[source]

Bases: gemseo.mlearning.qual_measure.error_measure.MLErrorMeasure

The Mean Squared Error measure for machine learning.

Attributes
  • algo (MLAlgo) – The machine learning algorithm.

  • algo (MLAlgo) – The machine learning algorithm.

Parameters

algo (MLRegressionAlgo) – A machine learning algorithm for regression.

Return type

None

Attributes:

BOOTSTRAP

KFOLDS

LEARN

LOO

SMALLER_IS_BETTER

TEST

Methods:

evaluate([method, samples])

Evaluate the quality measure.

evaluate_bootstrap([n_replicates, samples, …])

Evaluate the quality measure using the bootstrap technique.

evaluate_kfolds([n_folds, samples, multioutput])

Evaluate the quality measure using the k-folds technique.

evaluate_learn([samples, multioutput])

Evaluate the quality measure using the learning dataset.

evaluate_loo([samples, multioutput])

Evaluate the quality measure using the leave-one-out technique.

evaluate_test(test_data[, samples, multioutput])

Evaluate the quality measure using a test dataset.

is_better(val1, val2)

Compare the quality between two values.

BOOTSTRAP = 'bootstrap'
KFOLDS = 'kfolds'
LEARN = 'learn'
LOO = 'loo'
SMALLER_IS_BETTER = True
TEST = 'test'
evaluate(method='learn', samples=None, **options)

Evaluate the quality measure.

Parameters
  • method (str) – The name of the method to evaluate the quality measure.

  • samples (Optional[List[int]]) – The indices of the learning samples. If None, use the whole learning dataset.

  • **options – The options of the estimation method (e.g. ‘test_data’ for

  • 'test' method (the) –

  • for the bootstrap one ('n_replicates') –

  • ...)

  • options (Optional[Union[List[int], bool, int, gemseo.core.dataset.Dataset]]) –

Returns

The value of the quality measure.

Raises

ValueError – If the name of the method is unknown.

Return type

Union[float, numpy.ndarray]

evaluate_bootstrap(n_replicates=100, samples=None, multioutput=True)

Evaluate the quality measure using the bootstrap technique.

Parameters
  • n_replicates (int) – The number of bootstrap replicates.

  • samples (Optional[List[int]]) – The indices of the learning samples. If None, use the whole learning dataset.

  • multioutput (bool) – If True, return the quality measure for each output component. Otherwise, average these measures.

Returns

The value of the quality measure.

Return type

Union[float, numpy.ndarray]

evaluate_kfolds(n_folds=5, samples=None, multioutput=True)

Evaluate the quality measure using the k-folds technique.

Parameters
  • n_folds (int) – The number of folds.

  • samples (Optional[List[int]]) – The indices of the learning samples. If None, use the whole learning dataset.

  • multioutput (bool) – If True, return the quality measure for each output component. Otherwise, average these measures.

Returns

The value of the quality measure.

Return type

Union[float, numpy.ndarray]

evaluate_learn(samples=None, multioutput=True)

Evaluate the quality measure using the learning dataset.

Parameters
  • samples (Optional[List[int]]) – The indices of the learning samples. If None, use the whole learning dataset.

  • multioutput (bool) – If True, return the quality measure for each output component. Otherwise, average these measures.

Returns

The value of the quality measure.

Return type

Union[float, numpy.ndarray]

evaluate_loo(samples=None, multioutput=True)

Evaluate the quality measure using the leave-one-out technique.

Parameters
  • samples (Optional[List[int]]) – The indices of the learning samples. If None, use the whole learning dataset.

  • multioutput (bool) – If True, return the quality measure for each output component. Otherwise, average these measures.

Returns

The value of the quality measure.

Return type

Union[float, numpy.ndarray]

evaluate_test(test_data, samples=None, multioutput=True)

Evaluate the quality measure using a test dataset.

Parameters
  • dataset – The test dataset.

  • samples (Optional[List[int]]) – The indices of the learning samples. If None, use the whole learning dataset.

  • multioutput (bool) – If True, return the quality measure for each output component. Otherwise, average these measures.

  • test_data (gemseo.core.dataset.Dataset) –

Returns

The value of the quality measure.

Return type

Union[float, numpy.ndarray]

classmethod is_better(val1, val2)

Compare the quality between two values.

This methods returns True if the first one is better than the second one.

For most measures, a smaller value is “better” than a larger one (MSE etc.). But for some, like an R2-measure, higher values are better than smaller ones. This comparison method correctly handles this, regardless of the type of measure.

Parameters
  • val1 (float) – The value of the first quality measure.

  • val2 (float) – The value of the second quality measure.

Returns

Whether val1 is of better quality than val2.

Return type

bool