gemseo / mlearning / regression

# gpr module¶

## Gaussian process regression¶

### Overview¶

The Gaussian process regression (GPR) surrogate discipline expresses the model output as a weighted sum of kernel functions centered on the learning input data:

$y = \mu + w_1\kappa(\|x-x_1\|;\epsilon) + w_2\kappa(\|x-x_2\|;\epsilon) + ... + w_N\kappa(\|x-x_N\|;\epsilon)$

### Details¶

The GPR model relies on the assumption that the original model $$f$$ to replace is an instance of a Gaussian process (GP) with mean $$\mu$$ and covariance $$\sigma^2\kappa(\|x-x'|;\epsilon)$$.

Then, the GP conditioned by the learning set $$(x_i,y_i)_{1\leq i \leq N}$$ is entirely defined by its expectation:

$\hat{f}(x) = \hat{\mu} + w^T k(x)$

and its covariance:

$\hat{c}(x,x') = \hat{\sigma}^2 - k(x)^T K^{-1} k(x')$

where $$[\hat{\mu};\hat{w}]=([1_N~K]^T[1_N~K])^{-1}[1_N~K]^TY$$ with $$K_{ij}=\kappa(\|x_i-x_j\|;\hat{\epsilon})$$, $$k_i(x)=\kappa(\|x-x_i\|;\hat{\epsilon})$$ and $$Y_i=y_i$$.

The correlation length vector $$\epsilon$$ is estimated by numerical non-linear optimization.

### Surrogate model¶

The expectation $$\hat{f}$$ is the GPR surrogate model of $$f$$.

### Error measure¶

The standard deviation $$\hat{s}$$ is a local error measure of $$\hat{f}$$:

$\hat{s}(x):=\sqrt{c(x,x)}$

### Interpolation or regression¶

The GPR surrogate discipline can be regressive or interpolative according to the value of the nugget effect $$\\alpha\geq 0$$ which is a regularization term applied to the correlation matrix $$K$$. When $$\\alpha = 0$$, the surrogate model interpolates the learning data.

### Dependence¶

The GPR model relies on the GaussianProcessRegressor class of the scikit-learn library.

class gemseo.mlearning.regression.gpr.GaussianProcessRegression(data, transformer=None, input_names=None, output_names=None, kernel=None, alpha=1e-10, optimizer='fmin_l_bfgs_b', n_restarts_optimizer=10, random_state=None)[source]

Gaussian process regression

Constructor.

Parameters
• data (Dataset) – learning dataset

• transformer (dict(str)) – transformation strategy for data groups. If None, do not transform data. Default: None.

• input_names (list(str)) – names of the input variables.

• output_names (list(str)) – names of the output variables.

• kernel (openturns.Kernel) – kernel function. If None, use a Matern(2.5). Default: None.

• alpha (float or array) – nugget effect. Default: 1e-10.

• optimizer (str or callable) – optimization algorithm. Default: ‘fmin_l_bfgs_b’.

• n_restarts_optimizer (int) – number of restarts of the optimizer. Default: 10.

• random_state (int) – the seed used to initialize the centers. If None, the random number generator is the RandomState instance used by np.random Default: None.

ABBR = 'GPR'
LIBRARY = 'scikit-learn'
predict_std(input_data)[source]

Predict standard deviation value for given input data.

Parameters

input_data (dict(ndarray)) – input data (1D or 2D).

Returns

output data (1D or 2D, same as input_data).

Return type

dict(ndarray)