gemseo_mlearning / adaptive / distributions

regressor_distribution module¶

Universal distribution for regression models.

A RegressorDistribution samples a MLSupervisedAlgo, by learning new versions of the latter from subsets of the original learning dataset.

These new MLAlgo instances are based on sampling methods, such as bootstrap, cross-validation or leave-one-out.

Sampling a MLAlgo can be particularly useful to:

study the robustness of a MLAlgo w.r.t. learning dataset elements,
estimate infill criteria for adaptive learning purposes,
etc.

class gemseo_mlearning.adaptive.distributions.regressor_distribution.RegressorDistribution(algo, bootstrap=True, loo=False, size=None)[source]¶

Bases: MLRegressorDistribution

Distribution related to a regression algorithm.

Parameters:

algo (MLRegressionAlgo) – A regression model.
bootstrap (bool) –
The resampling method. If True, use bootstrap resampling. Otherwise, use cross-validation resampling.

By default it is set to True.
loo (bool) –
The leave-One-Out sub-method, when bootstrap is False. If False, use parameterized cross-validation, Otherwise use leave-one-out.

By default it is set to False.
size (int | None) – The size of the resampling set, i.e. the number of times the machine learning algorithm is rebuilt. If None, use the default size for bootstrap (MLAlgoSampler.N_BOOTSTRAP) and cross-validation (MLAlgoSampler.N_FOLDS).

change_learning_set(learning_set)[source]¶

Re-train the machine learning algorithm relying on the initial learning set.

Parameters:: learning_set (Dataset) – The new learning set.
Return type:: None

compute_confidence_interval(input_data, level=0.95)[source]¶

Predict the lower bounds and upper bounds from input data.

The user can specify the input data either as a NumPy array, e.g. array([1., 2., 3.]) or as a dictionary, e.g. {'a': array([1.]), 'b': array([2., 3.])}.

The output data type will be consistent with the input data type.

Parameters:

input_data (DataType) – The input data.
level (float) –
A quantile level.

By default it is set to 0.95.

Returns:

The lower and upper bound values.

Return type:

tuple[dict[str, ndarray], dict[str, ndarray], tuple[ndarray, ndarray]] | None

compute_expected_improvement(input_data, *args, **kwargs)¶

Evaluate func with either array or dictionary-based input data.

Firstly, the pre-processing stage converts the input data to a NumPy data array, if these data are expressed as a dictionary of NumPy data arrays.

Then, the processing evaluates the function func from this NumPy input data array.

Lastly, the post-processing transforms the output data to a dictionary of output NumPy data array if the input data were passed as a dictionary of NumPy data arrays.

Parameters:

algo (MLSupervisedAlgo) – The supervised learning algorithm.
input_data (DataType) – The input data.
*args (Any) – The positional arguments of the function func.
**kwargs (Any) – The keyword arguments of the function func.

Returns:

The output data with the same type as the input one.

Return type:

DataType

compute_mean(input_data, *args, **kwargs)¶

Evaluate func with either array or dictionary-based input data.

Firstly, the pre-processing stage converts the input data to a NumPy data array, if these data are expressed as a dictionary of NumPy data arrays.

Then, the processing evaluates the function func from this NumPy input data array.

Lastly, the post-processing transforms the output data to a dictionary of output NumPy data array if the input data were passed as a dictionary of NumPy data arrays.

Parameters:

algo (MLSupervisedAlgo) – The supervised learning algorithm.
input_data (DataType) – The input data.
*args (Any) – The positional arguments of the function func.
**kwargs (Any) – The keyword arguments of the function func.

Returns:

The output data with the same type as the input one.

Return type:

DataType

compute_standard_deviation(input_data, *args, **kwargs)¶

Evaluate func with either array or dictionary-based input data.

Firstly, the pre-processing stage converts the input data to a NumPy data array, if these data are expressed as a dictionary of NumPy data arrays.

Then, the processing evaluates the function func from this NumPy input data array.

Lastly, the post-processing transforms the output data to a dictionary of output NumPy data array if the input data were passed as a dictionary of NumPy data arrays.

Parameters:

algo (MLSupervisedAlgo) – The supervised learning algorithm.
input_data (DataType) – The input data.
*args (Any) – The positional arguments of the function func.
**kwargs (Any) – The keyword arguments of the function func.

Returns:

The output data with the same type as the input one.

Return type:

DataType

compute_variance(input_data, *args, **kwargs)¶

Evaluate func with either array or dictionary-based input data.

Firstly, the pre-processing stage converts the input data to a NumPy data array, if these data are expressed as a dictionary of NumPy data arrays.

Then, the processing evaluates the function func from this NumPy input data array.

Lastly, the post-processing transforms the output data to a dictionary of output NumPy data array if the input data were passed as a dictionary of NumPy data arrays.

Parameters:

algo (MLSupervisedAlgo) – The supervised learning algorithm.
input_data (DataType) – The input data.
*args (Any) – The positional arguments of the function func.
**kwargs (Any) – The keyword arguments of the function func.

Returns:

The output data with the same type as the input one.

Return type:

DataType

learn(samples=None)[source]¶

Train the machine learning algorithm from the learning dataset.

Parameters:: samples (list[int] | None) – The indices of the learning samples. If None, use the whole learning dataset
Return type:: None

predict(input_data)¶

Predict the output of the original machine learning algorithm.

The user can specify the input data either as a NumPy array, e.g. array([1., 2., 3.]) or as a dictionary, e.g. {'a': array([1.]), 'b': array([2., 3.])}.

The output data type will be consistent with the input data type.

Parameters:: input_data (DataType) – The input data.
Returns:: The predicted output data.
Return type:: DataType

predict_members(input_data)[source]¶

Predict the output value with the different machine learning algorithms.

After prediction, the method stacks the results.

Parameters:

input_data (DataType) – The input data, specified as either as a numpy.array or as dictionary of numpy.array indexed by inputs names. The numpy.array can be either a (d,) array representing a sample in dimension d, or a (M, d) array representing M samples in dimension d.

Returns:

The output data (dimension p) of N machine learning algorithms.: If input_data.shape == (d,), then output_data.shape == (N, p). If input_data.shape == (M,d), then output_data;shape == (N,M,p).

Return type:

DataType

BOOTSTRAP: Final[str] = 'b'¶

CROSS_VALIDATION: Final[str] = 'cv'¶

LOO: Final[str] = 'loo'¶

N_BOOTSTRAP: ClassVar[int] = 100¶

N_FOLDS: ClassVar[int] = 5¶

algo: MLRegressionAlgo¶: The regression model.

property input_names: list[str]¶: The names of the original machine learning algorithm inputs.

property learning_set: IODataset¶: The learning dataset used by the original machine learning algorithm.

method: str¶: The resampling method.

property output_dimension: int¶: The dimension of the machine learning output space.

property output_names: list[str]¶: The names of the original machine learning algorithm outputs.

size: int¶: The size of the resampling set.

weights: list[Callable[[ndarray], float]]¶

The weight functions related to the sub-algorithms.

A weight function computes a weight from an input data array.