distribution module¶

This module defines the notion of distribution of a machine learning algorithm.

Once a MLAlgo has been trained, assessing its quality is important before using it.

One can not only measure its global quality (e.g. from a MLQualityMeasure) but also its local one.

The MLRegressorDistribution class addresses the latter case, by quantifying the robustness of a machine learning algorithm to a learning point. The more robust it is, the less variability it has around this point.

Note

For now, one does not consider any MLAlgo but instances of MLRegressionAlgo.

The MLRegressorDistribution can be particularly useful to:

study the robustness of a MLAlgo w.r.t. learning dataset elements,
evaluate acquisition criteria for adaptive learning purposes (see MLDataAcquisition and MLDataAcquisitionCriterion),
etc.

The abstract MLRegressorDistribution class is derived into two classes:

KrigingDistribution:
the MLRegressionAlgo is a Kriging model and this assessor takes advantage of the underlying Gaussian stochastic process,
RegressorDistribution:
this class is based on sampling methods, such as bootstrap, cross-validation or leave-one-out.

See also

KrigingDistribution RegressorDistribution MLDataAcquisition MLDataAcquisitionCriterion MLDataAcquisitionCriterionFactory

class gemseo_mlearning.adaptive.distribution.MLRegressorDistribution(algo)[source]¶

Bases: object

Distribution related to a regression model.

# noqa: D205 D212 D415 :param algo: A regression model.

Parameters: algo (gemseo.mlearning.regression.regression.MLRegressionAlgo) –
Return type: None

change_learning_set(learning_set)[source]¶

Re-train the machine learning algorithm relying on the initial learning set.

Parameters: learning_set (gemseo.core.dataset.Dataset) – The new learning set.
Return type: None

compute_confidence_interval(input_data, level=0.95)[source]¶

Predict the lower bounds and upper bounds from input data.

The user can specify the input data either as a NumPy array, e.g. array([1., 2., 3.]) or as a dictionary, e.g. {'a': array([1.]), 'b': array([2., 3.])}.

The output data type will be consistent with the input data type.

Parameters

input_data (DataType) – The input data.
level (float) –
A quantile level.

By default it is set to 0.95.

Returns

The lower and upper bound values.

Return type

tuple[dict[str, ndarray], dict[str, ndarray], tuple[ndarray, ndarray]] | None

compute_expected_improvement(input_data, *args, **kwargs)¶

Evaluate ‘predict’ with either array or dictionary-based input data.

Firstly, the pre-processing stage converts the input data to a NumPy data array, if these data are expressed as a dictionary of NumPy data arrays.

Then, the processing evaluates the function ‘predict’ from this NumPy input data array.

Lastly, the post-processing transforms the output data to a dictionary of output NumPy data array if the input data were passed as a dictionary of NumPy data arrays.

Parameters

input_data (Union[numpy.ndarray, Mapping[str, numpy.ndarray]]) – The input data.
*args – The positional arguments of the function ‘predict’.
**kwargs – The keyword arguments of the function ‘predict’.

Returns

The output data with the same type as the input one.

Return type

Union[numpy.ndarray, Mapping[str, numpy.ndarray]]

compute_mean(input_data, *args, **kwargs)¶

Evaluate ‘predict’ with either array or dictionary-based input data.

Firstly, the pre-processing stage converts the input data to a NumPy data array, if these data are expressed as a dictionary of NumPy data arrays.

Then, the processing evaluates the function ‘predict’ from this NumPy input data array.

Lastly, the post-processing transforms the output data to a dictionary of output NumPy data array if the input data were passed as a dictionary of NumPy data arrays.

Parameters

input_data (Union[numpy.ndarray, Mapping[str, numpy.ndarray]]) – The input data.
*args – The positional arguments of the function ‘predict’.
**kwargs – The keyword arguments of the function ‘predict’.

Returns

The output data with the same type as the input one.

Return type

Union[numpy.ndarray, Mapping[str, numpy.ndarray]]

compute_standard_deviation(input_data, *args, **kwargs)¶

Evaluate ‘predict’ with either array or dictionary-based input data.

Firstly, the pre-processing stage converts the input data to a NumPy data array, if these data are expressed as a dictionary of NumPy data arrays.

Then, the processing evaluates the function ‘predict’ from this NumPy input data array.

Lastly, the post-processing transforms the output data to a dictionary of output NumPy data array if the input data were passed as a dictionary of NumPy data arrays.

Parameters

input_data (Union[numpy.ndarray, Mapping[str, numpy.ndarray]]) – The input data.
*args – The positional arguments of the function ‘predict’.
**kwargs – The keyword arguments of the function ‘predict’.

Returns

The output data with the same type as the input one.

Return type

Union[numpy.ndarray, Mapping[str, numpy.ndarray]]

compute_variance(input_data, *args, **kwargs)¶

Evaluate ‘predict’ with either array or dictionary-based input data.

Firstly, the pre-processing stage converts the input data to a NumPy data array, if these data are expressed as a dictionary of NumPy data arrays.

Then, the processing evaluates the function ‘predict’ from this NumPy input data array.

Lastly, the post-processing transforms the output data to a dictionary of output NumPy data array if the input data were passed as a dictionary of NumPy data arrays.

Parameters

input_data (Union[numpy.ndarray, Mapping[str, numpy.ndarray]]) – The input data.
*args – The positional arguments of the function ‘predict’.
**kwargs – The keyword arguments of the function ‘predict’.

Returns

The output data with the same type as the input one.

Return type

Union[numpy.ndarray, Mapping[str, numpy.ndarray]]

learn(samples=None)[source]¶

Train the machine learning algorithm from the learning dataset.

Parameters

samples (list[int] | None) –

The indices of the learning samples. If None, use the whole learning dataset

By default it is set to None.

Return type

None

predict(input_data)[source]¶

Predict the output of the original machine learning algorithm.

The user can specify the input data either as a NumPy array, e.g. array([1., 2., 3.]) or as a dictionary, e.g. {'a': array([1.]), 'b': array([2., 3.])}.

The output data type will be consistent with the input data type.

Parameters: input_data (Union[numpy.ndarray, Mapping[str, numpy.ndarray]]) – The input data.
Returns: The predicted output data.
Return type: Union[numpy.ndarray, Mapping[str, numpy.ndarray]]

algo: gemseo.mlearning.regression.regression.MLRegressionAlgo¶: The regression model.

property input_names: list[str]¶: The names of the original machine learning algorithm inputs.

property learning_set: gemseo.core.dataset.Dataset¶: The learning dataset used by the original machine learning algorithm.

property output_dimension: int¶: The dimension of the machine learning output space.

property output_names: list[str]¶: The names of the original machine learning algorithm outputs.