distribution module¶

This module defines the notion of distribution of a machine learning algorithm.

Once a MLAlgo has been trained, assessing its quality is important before using it.

One can not only measure its global quality (e.g. from a MLQualityMeasure) but also its local one.

The MLRegressorDistribution class addresses the latter case, by quantifying the robustness of a machine learning algorithm to a learning point. The more robust it is, the less variability it has around this point.

Note

For now, one does not consider any MLAlgo but instances of MLRegressionAlgo.

The MLRegressorDistribution can be particularly useful to:

study the robustness of a MLAlgo w.r.t. learning dataset elements,
evaluate acquisition criteria for adaptive learning purposes (see MLDataAcquisition and MLDataAcquisitionCriterion),
etc.

The abstract MLRegressorDistribution class is derived into two classes:

KrigingDistribution:
the MLRegressionAlgo is a Kriging model and this assessor takes advantage of the underlying Gaussian stochastic process,
RegressorDistribution:
this class is based on sampling methods, such as bootstrap, cross-validation or leave-one-out.

See also

KrigingDistribution RegressorDistribution MLDataAcquisition MLDataAcquisitionCriterion MLDataAcquisitionCriterionFactory

class gemseo_mlearning.adaptive.distribution.MLRegressorDistribution(algo)[source]¶

Bases: object

Distribution related to a regression model.

# noqa: D205 D212 D415 :param algo: A regression model.

Parameters:: algo (MLRegressionAlgo) –

change_learning_set(learning_set)[source]¶

Re-train the machine learning algorithm relying on the initial learning set.

Parameters:: learning_set (Dataset) – The new learning set.
Return type:: None

abstract compute_confidence_interval(input_data, level=0.95)[source]¶

Predict the lower bounds and upper bounds from input data.

The user can specify the input data either as a NumPy array, e.g. array([1., 2., 3.]) or as a dictionary, e.g. {'a': array([1.]), 'b': array([2., 3.])}.

The output data type will be consistent with the input data type.

Parameters:

input_data (DataType) – The input data.
level (float) –
A quantile level.

By default it is set to 0.95.

Returns:

The lower and upper bound values.

Return type:

tuple[dict[str, ndarray], dict[str, ndarray], tuple[ndarray, ndarray]] | None

abstract compute_expected_improvement(input_data, fopt, maximize=False)[source]¶

Compute the expected improvement from input data.

The user can specify the input data either as a NumPy array, e.g. array([1., 2., 3.]) or as a dictionary, e.g. {'a': array([1.]), 'b': array([2., 3.])}.

The output data type will be consistent with the input data type.

Parameters:

input_data (Union[ndarray, Mapping[str, ndarray]]) – The input data.
fopt (float) – The current optimum value.
maximize (bool) –
The type of optimum to seek.

By default it is set to False.

Returns:

The expected improvement value.

Return type:

Union[ndarray, Mapping[str, ndarray]]

abstract compute_mean(input_data)[source]¶

Compute the mean from input data.

The user can specify the input data either as a NumPy array, e.g. array([1., 2., 3.]) or as a dictionary, e.g. {'a': array([1.]), 'b': array([2., 3.])}.

The output data type will be consistent with the input data type.

Parameters:: input_data (Union[ndarray, Mapping[str, ndarray]]) – The input data.
Returns:: The mean value.
Return type:: Union[ndarray, Mapping[str, ndarray]]

compute_standard_deviation(input_data, *args, **kwargs)¶

Evaluate ‘predict’ with either array or dictionary-based input data.

Firstly, the pre-processing stage converts the input data to a NumPy data array, if these data are expressed as a dictionary of NumPy data arrays.

Then, the processing evaluates the function ‘predict’ from this NumPy input data array.

Lastly, the post-processing transforms the output data to a dictionary of output NumPy data array if the input data were passed as a dictionary of NumPy data arrays.

Parameters:

input_data (Union[ndarray, Mapping[str, ndarray]]) – The input data.
*args – The positional arguments of the function ‘predict’.
**kwargs – The keyword arguments of the function ‘predict’.

Returns:

The output data with the same type as the input one.

Return type:

Union[ndarray, Mapping[str, ndarray]]

abstract compute_variance(input_data)[source]¶

Compute the variance from input data.

The user can specify the input data either as a NumPy array, e.g. array([1., 2., 3.]) or as a dictionary, e.g. {'a': array([1.]), 'b': array([2., 3.])}.

The output data type will be consistent with the input data type.

Parameters:: input_data (Union[ndarray, Mapping[str, ndarray]]) – The input data.
Returns:: The variance value.
Return type:: Union[ndarray, Mapping[str, ndarray]]

learn(samples=None)[source]¶

Train the machine learning algorithm from the learning dataset.

Parameters:: samples (list[int] | None) – The indices of the learning samples. If None, use the whole learning dataset
Return type:: None

predict(input_data)[source]¶

Predict the output of the original machine learning algorithm.

The user can specify the input data either as a NumPy array, e.g. array([1., 2., 3.]) or as a dictionary, e.g. {'a': array([1.]), 'b': array([2., 3.])}.

The output data type will be consistent with the input data type.

Parameters:: input_data (Union[ndarray, Mapping[str, ndarray]]) – The input data.
Returns:: The predicted output data.
Return type:: Union[ndarray, Mapping[str, ndarray]]

algo: MLRegressionAlgo¶: The regression model.

property input_names: list[str]¶: The names of the original machine learning algorithm inputs.

property learning_set: Dataset¶: The learning dataset used by the original machine learning algorithm.

property output_dimension: int¶: The dimension of the machine learning output space.

property output_names: list[str]¶: The names of the original machine learning algorithm outputs.