Gaussian process regression¶
Overview¶
The Gaussian process regression (GPR) surrogate discipline expresses the model output as a weighted sum of kernel functions centered on the learning input data:
Details¶
The GPR model relies on the assumption that the original model \(f\) to replace is an instance of a Gaussian process (GP) with mean \(\mu\) and covariance \(\sigma^2\kappa(\|x-x'|;\epsilon)\).
Then, the GP conditioned by the learning set \((x_i,y_i)_{1\leq i \leq N}\) is entirely defined by its expectation:
and its covariance:
where \([\hat{\mu};\hat{w}]=([1_N~K]^T[1_N~K])^{-1}[1_N~K]^TY\) with \(K_{ij}=\kappa(\|x_i-x_j\|;\hat{\epsilon})\), \(k_i(x)=\kappa(\|x-x_i\|;\hat{\epsilon})\) and \(Y_i=y_i\).
The correlation length vector \(\epsilon\) is estimated by numerical non-linear optimization.
Surrogate model¶
The expectation \(\hat{f}\) is the GPR surrogate model of \(f\).
Error measure¶
The standard deviation \(\hat{s}\) is a local error measure of \(\hat{f}\):
Interpolation or regression¶
The GPR surrogate discipline can be regressive or interpolative according to the value of the nugget effect \(\\alpha\geq 0\) which is a regularization term applied to the correlation matrix \(K\). When \(\\alpha = 0\), the surrogate model interpolates the learning data.
Dependence¶
The GPR model relies on the GaussianProcessRegressor class of the scikit-learn library.
-
class
gemseo.mlearning.regression.gpr.
GaussianProcessRegression
(data, transformer=None, input_names=None, output_names=None, kernel=None, alpha=1e-10, optimizer='fmin_l_bfgs_b', n_restarts_optimizer=10, random_state=None)[source] Gaussian process regression
Constructor.
- Parameters
data (Dataset) – learning dataset
transformer (dict(str)) – transformation strategy for data groups. If None, do not transform data. Default: None.
input_names (list(str)) – names of the input variables.
output_names (list(str)) – names of the output variables.
kernel (openturns.Kernel) – kernel function. If None, use a Matern(2.5). Default: None.
alpha (float or array) – nugget effect. Default: 1e-10.
optimizer (str or callable) – optimization algorithm. Default: ‘fmin_l_bfgs_b’.
n_restarts_optimizer (int) – number of restarts of the optimizer. Default: 10.
random_state (int) – the seed used to initialize the centers. If None, the random number generator is the RandomState instance used by np.random Default: None.