gemseo.mlearning.regression.algos.gpr module#
Gaussian process regression model.
Overview#
The Gaussian process regression (GPR) model expresses the model output as a weighted sum of kernel functions centered on the learning input data:
Details#
The GPR model relies on the assumption that the original model \(f\) to replace is an instance of a Gaussian process (GP) with mean \(\mu\) and covariance \(\sigma^2\kappa(\|x-x'\|;\epsilon)\).
Then, the GP conditioned by the learning set \((x_i,y_i)_{1\leq i \leq N}\) is entirely defined by its expectation:
and its covariance:
where \([\hat{\mu};\hat{w}]=([1_N~K]^T[1_N~K])^{-1}[1_N~K]^TY\) with \(K_{ij}=\kappa(\|x_i-x_j\|;\hat{\epsilon})\), \(k_i(x)=\kappa(\|x-x_i\|;\hat{\epsilon})\) and \(Y_i=y_i\).
The correlation length vector \(\epsilon\) is estimated by numerical non-linear optimization.
Surrogate model#
The expectation \(\hat{f}\) is the surrogate model of \(f\).
Error measure#
The standard deviation \(\hat{s}\) is a local error measure of \(\hat{f}\):
Interpolation or regression#
The GPR model can be regressive or interpolative according to the value of the nugget effect \(\alpha\geq 0\) which is a regularization term applied to the correlation matrix \(K\). When \(\alpha = 0\), the surrogate model interpolates the learning data.
Dependence#
The GPR model relies on the GaussianProcessRegressor class of the scikit-learn library.
- class GaussianProcessRegressor(data, settings_model=None, **settings)[source]#
Bases:
BaseRandomProcessRegressor
Gaussian process regression model.
- Parameters:
data (Dataset) -- The learning dataset.
settings_model (BaseMLAlgoSettings | None) -- The machine learning algorithm settings as a Pydantic model. If
None
, use**settings
.**settings (Any) -- The machine learning algorithm settings. These arguments are ignored when
settings_model
is notNone
.
- Raises:
ValueError -- When both the variable and the group it belongs to have a transformer.
- Settings#
alias of
GaussianProcessRegressor_Settings
- compute_samples(input_data, n_samples, seed=None)[source]#
Sample a random vector from the conditioned Gaussian process.
- Parameters:
- Returns:
The output samples shaped as
(M, N, p)
wherep
is the output dimension.- Return type:
RealArray
- predict_std(input_data)[source]#
Predict the standard deviation from input data.
The user can specify these input data either as a NumPy array, e.g.
array([1., 2., 3.])
or as a dictionary of NumPy arrays, e.g.{'a': array([1.]), 'b': array([2., 3.])}
.If the NumPy arrays are of dimension 2, their i-th rows represent the input data of the i-th sample; while if the NumPy arrays are of dimension 1, there is a single sample.
- Parameters:
input_data (DataType) -- The input data.
- Returns:
The standard deviation at the query points.
Warning
This statistic is expressed in relation to the transformed output space. You can sample the
predict()
method to estimate it in relation to the original output space if it is different from the transformed output space.- Return type:
RealArray
- LIBRARY: ClassVar[str] = 'scikit-learn'#
The name of the library of the wrapped machine learning algorithm.
- SHORT_ALGO_NAME: ClassVar[str] = 'GPR'#
The short name of the machine learning algorithm, often an acronym.
Typically used for composite names, e.g.
f"{algo.SHORT_ALGO_NAME}_{dataset.name}"
orf"{algo.SHORT_ALGO_NAME}_{discipline.name}"
.
- property kernel#
The kernel used for prediction.