regressor_distribution module¶
Universal distribution for regression models.
A RegressorDistribution
samples a MLSupervisedAlgo
,
by learning new versions of the latter from subsets of the original learning dataset.
These new MLAlgo
instances are based on sampling methods,
such as bootstrap, cross-validation or leave-one-out.
Sampling a MLAlgo
can be particularly useful to:
study the robustness of a
MLAlgo
w.r.t. learning dataset elements,estimate infill criteria for adaptive learning purposes,
etc.
- class gemseo_mlearning.adaptive.distributions.regressor_distribution.RegressorDistribution(algo, bootstrap=True, loo=False, size=None)[source]¶
Bases:
gemseo_mlearning.adaptive.distribution.MLRegressorDistribution
Distribution related to a regression algorithm.
# noqa: D205 D212 D415
- Parameters
algo (MLRegressionAlgo) – A regression model.
bootstrap (bool) –
The resampling method. If True, use bootstrap resampling. Otherwise, use cross-validation resampling.
By default it is set to True.
loo (bool) –
The leave-One-Out sub-method, when bootstrap is False. If False, use parameterized cross-validation, Otherwise use leave-one-out.
By default it is set to False.
size (int | None) –
The size of the resampling set, i.e. the number of times the machine learning algorithm is rebuilt. If
None
, use the default size for bootstrap (MLAlgoSampler.N_BOOTSTRAP
) and cross-validation (MLAlgoSampler.N_FOLDS
).By default it is set to None.
- Return type
None
- change_learning_set(learning_set)[source]¶
Re-train the machine learning algorithm relying on the initial learning set.
- Parameters
learning_set (gemseo.core.dataset.Dataset) – The new learning set.
- Return type
None
- compute_confidence_interval(input_data, level=0.95)[source]¶
Predict the lower bounds and upper bounds from input data.
The user can specify the input data either as a NumPy array, e.g.
array([1., 2., 3.])
or as a dictionary, e.g.{'a': array([1.]), 'b': array([2., 3.])}
.The output data type will be consistent with the input data type.
- compute_expected_improvement(input_data, *args, **kwargs)¶
Evaluate ‘predict’ with either array or dictionary-based input data.
Firstly, the pre-processing stage converts the input data to a NumPy data array, if these data are expressed as a dictionary of NumPy data arrays.
Then, the processing evaluates the function ‘predict’ from this NumPy input data array.
Lastly, the post-processing transforms the output data to a dictionary of output NumPy data array if the input data were passed as a dictionary of NumPy data arrays.
- Parameters
input_data (Union[numpy.ndarray, Mapping[str, numpy.ndarray]]) – The input data.
*args – The positional arguments of the function ‘predict’.
**kwargs – The keyword arguments of the function ‘predict’.
- Returns
The output data with the same type as the input one.
- Return type
Union[numpy.ndarray, Mapping[str, numpy.ndarray]]
- compute_mean(input_data, *args, **kwargs)¶
Evaluate ‘predict’ with either array or dictionary-based input data.
Firstly, the pre-processing stage converts the input data to a NumPy data array, if these data are expressed as a dictionary of NumPy data arrays.
Then, the processing evaluates the function ‘predict’ from this NumPy input data array.
Lastly, the post-processing transforms the output data to a dictionary of output NumPy data array if the input data were passed as a dictionary of NumPy data arrays.
- Parameters
input_data (Union[numpy.ndarray, Mapping[str, numpy.ndarray]]) – The input data.
*args – The positional arguments of the function ‘predict’.
**kwargs – The keyword arguments of the function ‘predict’.
- Returns
The output data with the same type as the input one.
- Return type
Union[numpy.ndarray, Mapping[str, numpy.ndarray]]
- compute_standard_deviation(input_data, *args, **kwargs)¶
Evaluate ‘predict’ with either array or dictionary-based input data.
Firstly, the pre-processing stage converts the input data to a NumPy data array, if these data are expressed as a dictionary of NumPy data arrays.
Then, the processing evaluates the function ‘predict’ from this NumPy input data array.
Lastly, the post-processing transforms the output data to a dictionary of output NumPy data array if the input data were passed as a dictionary of NumPy data arrays.
- Parameters
input_data (Union[numpy.ndarray, Mapping[str, numpy.ndarray]]) – The input data.
*args – The positional arguments of the function ‘predict’.
**kwargs – The keyword arguments of the function ‘predict’.
- Returns
The output data with the same type as the input one.
- Return type
Union[numpy.ndarray, Mapping[str, numpy.ndarray]]
- compute_variance(input_data, *args, **kwargs)¶
Evaluate ‘predict’ with either array or dictionary-based input data.
Firstly, the pre-processing stage converts the input data to a NumPy data array, if these data are expressed as a dictionary of NumPy data arrays.
Then, the processing evaluates the function ‘predict’ from this NumPy input data array.
Lastly, the post-processing transforms the output data to a dictionary of output NumPy data array if the input data were passed as a dictionary of NumPy data arrays.
- Parameters
input_data (Union[numpy.ndarray, Mapping[str, numpy.ndarray]]) – The input data.
*args – The positional arguments of the function ‘predict’.
**kwargs – The keyword arguments of the function ‘predict’.
- Returns
The output data with the same type as the input one.
- Return type
Union[numpy.ndarray, Mapping[str, numpy.ndarray]]
- predict(input_data)¶
Predict the output of the original machine learning algorithm.
The user can specify the input data either as a NumPy array, e.g.
array([1., 2., 3.])
or as a dictionary, e.g.{'a': array([1.]), 'b': array([2., 3.])}
.The output data type will be consistent with the input data type.
- Parameters
input_data (Union[numpy.ndarray, Mapping[str, numpy.ndarray]]) – The input data.
- Returns
The predicted output data.
- Return type
Union[numpy.ndarray, Mapping[str, numpy.ndarray]]
- predict_members(input_data)[source]¶
Predict the output value with the different machine learning algorithms.
After prediction, the method stacks the results.
- Parameters
input_data (Union[numpy.ndarray, Mapping[str, numpy.ndarray]]) – The input data, specified as either as a numpy.array or as dictionary of numpy.array indexed by inputs names. The numpy.array can be either a (d,) array representing a sample in dimension d, or a (M, d) array representing M samples in dimension d.
- Returns
- The output data (dimension p) of N machine learning algorithms.
If input_data.shape == (d,), then output_data.shape == (N, p). If input_data.shape == (M,d), then output_data;shape == (N,M,p).
- Return type
Union[numpy.ndarray, Mapping[str, numpy.ndarray]]
- algo: gemseo.mlearning.regression.regression.MLRegressionAlgo¶
The regression model.
- property learning_set: gemseo.core.dataset.Dataset¶
The learning dataset used by the original machine learning algorithm.
- weights: list[Callable[[numpy.ndarray], float]]¶
The weight functions related to the sub-algorithms.
A weight function computes a weight from an input data array.
- gemseo_mlearning.adaptive.distributions.regressor_distribution.choice(a, size=None, replace=True, p=None)¶
Generates a random sample from a given 1-D array
New in version 1.7.0.
Note
New code should use the
choice
method of adefault_rng()
instance instead; please see the Quick Start.- Parameters
a (1-D array-like or int) – If an ndarray, a random sample is generated from its elements. If an int, the random sample is generated as if it were
np.arange(a)
size (int or tuple of ints, optional) – Output shape. If the given shape is, e.g.,
(m, n, k)
, thenm * n * k
samples are drawn. Default is None, in which case a single value is returned.replace (boolean, optional) – Whether the sample is with or without replacement. Default is True, meaning that a value of
a
can be selected multiple times.p (1-D array-like, optional) – The probabilities associated with each entry in a. If not given, the sample assumes a uniform distribution over all entries in
a
.
- Returns
samples – The generated random samples
- Return type
single item or ndarray
- Raises
ValueError – If a is an int and less than zero, if a or p are not 1-dimensional, if a is an array-like of size 0, if p is not a vector of probabilities, if a and p have different lengths, or if replace=False and the sample size is greater than the population size
See also
randint
,shuffle
,permutation
Generator.choice
which should be used in new code
Notes
Setting user-specified probabilities through
p
uses a more general but less efficient sampler than the default. The general sampler produces a different sample than the optimized sampler even if each element ofp
is 1 / len(a).Sampling random rows from a 2-D array is not possible with this function, but is possible with Generator.choice through its
axis
keyword.Examples
Generate a uniform random sample from np.arange(5) of size 3:
>>> np.random.choice(5, 3) array([0, 3, 4]) # random >>> #This is equivalent to np.random.randint(0,5,3)
Generate a non-uniform random sample from np.arange(5) of size 3:
>>> np.random.choice(5, 3, p=[0.1, 0, 0.3, 0.6, 0]) array([3, 3, 0]) # random
Generate a uniform random sample from np.arange(5) of size 3 without replacement:
>>> np.random.choice(5, 3, replace=False) array([3,1,0]) # random >>> #This is equivalent to np.random.permutation(np.arange(5))[:3]
Generate a non-uniform random sample from np.arange(5) of size 3 without replacement:
>>> np.random.choice(5, 3, replace=False, p=[0.1, 0, 0.3, 0.6, 0]) array([2, 3, 0]) # random
Any of the above can be repeated with an arbitrary array-like instead of just integers. For instance:
>>> aa_milne_arr = ['pooh', 'rabbit', 'piglet', 'Christopher'] >>> np.random.choice(aa_milne_arr, 5, p=[0.5, 0.1, 0.1, 0.3]) array(['pooh', 'pooh', 'pooh', 'Christopher', 'piglet'], # random dtype='<U11')