Gaussian process regression

Overview

The Gaussian process regression (GPR) surrogate discipline expresses the model output as a weighted sum of kernel functions centered on the learning input data:

\[y = \mu + w_1\kappa(\|x-x_1\|;\epsilon) + w_2\kappa(\|x-x_2\|;\epsilon) + ... + w_N\kappa(\|x-x_N\|;\epsilon)\]

Details

The GPR model relies on the assumption that the original model \(f\) to replace is an instance of a Gaussian process (GP) with mean \(\mu\) and covariance \(\sigma^2\kappa(\|x-x'|;\epsilon)\).

Then, the GP conditioned by the learning set \((x_i,y_i)_{1\leq i \leq N}\) is entirely defined by its expectation:

\[\hat{f}(x) = \hat{\mu} + w^T k(x)\]

and its covariance:

\[\hat{c}(x,x') = \hat{\sigma}^2 - k(x)^T K^{-1} k(x')\]

where \([\hat{\mu};\hat{w}]=([1_N~K]^T[1_N~K])^{-1}[1_N~K]^TY\) with \(K_{ij}=\kappa(\|x_i-x_j\|;\hat{\epsilon})\), \(k_i(x)=\kappa(\|x-x_i\|;\hat{\epsilon})\) and \(Y_i=y_i\).

The correlation length vector \(\epsilon\) is estimated by numerical non-linear optimization.

Surrogate model

The expectation \(\hat{f}\) is the GPR surrogate model of \(f\).

Error measure

The standard deviation \(\hat{s}\) is a local error measure of \(\hat{f}\):

\[\hat{s}(x):=\sqrt{c(x,x)}\]

Interpolation or regression

The GPR surrogate discipline can be regressive or interpolative according to the value of the nugget effect \(\\alpha\geq 0\) which is a regularization term applied to the correlation matrix \(K\). When \(\\alpha = 0\), the surrogate model interpolates the learning data.

Dependence

The GPR model relies on the GaussianProcessRegressor class of the scikit-learn library.

class gemseo.mlearning.regression.gpr.GaussianProcessRegression(data, transformer=None, input_names=None, output_names=None, kernel=None, alpha=1e-10, optimizer='fmin_l_bfgs_b', n_restarts_optimizer=10, random_state=None)[source]

Gaussian process regression

Constructor.

Parameters
  • data (Dataset) – learning dataset

  • transformer (dict(str)) – transformation strategy for data groups. If None, do not transform data. Default: None.

  • input_names (list(str)) – names of the input variables.

  • output_names (list(str)) – names of the output variables.

  • kernel (openturns.Kernel) – kernel function. If None, use a Matern(2.5). Default: None.

  • alpha (float or array) – nugget effect. Default: 1e-10.

  • optimizer (str or callable) – optimization algorithm. Default: ‘fmin_l_bfgs_b’.

  • n_restarts_optimizer (int) – number of restarts of the optimizer. Default: 10.

  • random_state (int) – the seed used to initialize the centers. If None, the random number generator is the RandomState instance used by np.random Default: None.

Example