The Gaussian process algorithm for regression.

Overview

The Gaussian process regression (GPR) surrogate model expresses the model output as a weighted sum of kernel functions centered on the learning input data:

\[y = \mu + w_1\kappa(\|x-x_1\|;\epsilon) + w_2\kappa(\|x-x_2\|;\epsilon) + ... + w_N\kappa(\|x-x_N\|;\epsilon)\]

Details

The GPR model relies on the assumption that the original model \(f\) to replace is an instance of a Gaussian process (GP) with mean \(\mu\) and covariance \(\sigma^2\kappa(\|x-x'\|;\epsilon)\).

Then, the GP conditioned by the learning set \((x_i,y_i)_{1\leq i \leq N}\) is entirely defined by its expectation:

\[\hat{f}(x) = \hat{\mu} + \hat{w}^T k(x)\]

and its covariance:

\[\hat{c}(x,x') = \hat{\sigma}^2 - k(x)^T K^{-1} k(x')\]

where \([\hat{\mu};\hat{w}]=([1_N~K]^T[1_N~K])^{-1}[1_N~K]^TY\) with \(K_{ij}=\kappa(\|x_i-x_j\|;\hat{\epsilon})\), \(k_i(x)=\kappa(\|x-x_i\|;\hat{\epsilon})\) and \(Y_i=y_i\).

The correlation length vector \(\epsilon\) is estimated by numerical non-linear optimization.

Surrogate model

The expectation \(\hat{f}\) is the GPR surrogate model of \(f\).

Error measure

The standard deviation \(\hat{s}\) is a local error measure of \(\hat{f}\):

\[\hat{s}(x):=\sqrt{\hat{c}(x,x)}\]

Interpolation or regression

The GPR surrogate model can be regressive or interpolative according to the value of the nugget effect \(\\alpha\geq 0\) which is a regularization term applied to the correlation matrix \(K\). When \(\alpha = 0\), the surrogate model interpolates the learning data.

Dependence

The GPR model relies on the GaussianProcessRegressor class of the scikit-learn library.

Classes:

GaussianProcessRegression(data[, …])

Gaussian process regression.

class gemseo.mlearning.regression.gpr.GaussianProcessRegression(data, transformer=None, input_names=None, output_names=None, kernel=None, alpha=1e-10, optimizer='fmin_l_bfgs_b', n_restarts_optimizer=10, random_state=None)[source]

Gaussian process regression.

Parameters
  • kernel (Optional[openturns.CovarianceModel]) – The kernel function. If None, use a Matern(2.5).

  • alpha (Union[float,ndarray]) – The nugget effect to regularize the model.

  • optimizer (Union[str,Callable]) – The optimization algorithm to find the hyperparameters.

  • n_restarts_optimizer (int) – The number of restarts of the optimizer.

  • random_state (Optional[int]) – The seed used to initialize the centers. If None, the random number generator is the RandomState instance used by numpy.random.

  • data (Dataset) –

  • transformer (Optional[TransformerType]) –

  • input_names (Optional[Iterable[str]]) –

  • output_names (Optional[Iterable[str]]) –

Return type

None

Classes:

DataFormatters()

Machine learning regression model decorators.

Attributes:

input_data

The input data matrix.

input_shape

The dimension of the input variables before applying the transformers.

is_trained

Return whether the algorithm is trained.

output_data

The output data matrix.

output_shape

The dimension of the output variables before applying the transformers.

Methods:

learn([samples])

Train the machine learning algorithm from the learning dataset.

load_algo(directory)

Load a machine learning algorithm from a directory.

predict(input_data, *args, **kwargs)

Evaluate ‘predict’ with either array or dictionary-based input data.

predict_jacobian(input_data, *args, **kwargs)

Evaluate ‘predict_jac’ with either array or dictionary-based data.

predict_raw(input_data)

Predict output data from input data.

predict_std(input_data)

Predict the standard deviation from input data.

save([directory, path, save_learning_set])

Save the machine learning algorithm.

class DataFormatters

Machine learning regression model decorators.

Methods:

format_dict(predict)

Make an array-based function be called with a dictionary of NumPy arrays.

format_dict_jacobian(predict_jac)

Wrap an array-based function to make it callable with a dictionary of NumPy arrays.

format_input_output(predict)

Make a function robust to type, array shape and data transformation.

format_samples(predict)

Make a 2D NumPy array-based function work with 1D NumPy array.

format_transform([transform_inputs, …])

Force a function to transform its input and/or output variables.

transform_jacobian(predict_jac)

Apply transformation to inputs and inverse transformation to outputs.

classmethod format_dict(predict)

Make an array-based function be called with a dictionary of NumPy arrays.

Parameters

predict (Callable[[numpy.ndarray], numpy.ndarray]) – The function to be called; it takes a NumPy array in input and returns a NumPy array.

Returns

A function making the function ‘predict’ work with either a NumPy data array or a dictionary of NumPy data arrays indexed by variables names. The evaluation will have the same type as the input data.

Return type

Callable[[Union[numpy.ndarray, Dict[str, numpy.ndarray]]], Union[numpy.ndarray, Dict[str, numpy.ndarray]]]

classmethod format_dict_jacobian(predict_jac)

Wrap an array-based function to make it callable with a dictionary of NumPy arrays.

Parameters

predict_jac (Callable[[numpy.ndarray], numpy.ndarray]) – The function to be called; it takes a NumPy array in input and returns a NumPy array.

Returns

The wrapped ‘predict_jac’ function, callable with either a NumPy data array or a dictionary of numpy data arrays indexed by variables names. The return value will have the same type as the input data.

Return type

Callable[[Union[numpy.ndarray, Dict[str, numpy.ndarray]]], Union[numpy.ndarray, Dict[str, numpy.ndarray]]]

classmethod format_input_output(predict)

Make a function robust to type, array shape and data transformation.

Parameters

predict (Callable[[numpy.ndarray], numpy.ndarray]) – The function of interest to be called.

Returns

A function calling the function of interest ‘predict’, while guaranteeing consistency in terms of data type and array shape, and applying input and/or output data transformation if required.

Return type

Callable[[Union[numpy.ndarray, Dict[str, numpy.ndarray]]], Union[numpy.ndarray, Dict[str, numpy.ndarray]]]

classmethod format_samples(predict)

Make a 2D NumPy array-based function work with 1D NumPy array.

Parameters

predict (Callable[[numpy.ndarray], numpy.ndarray]) – The function to be called; it takes a 2D NumPy array in input and returns a 2D NumPy array. The first dimension represents the samples while the second one represents the components of the variables.

Returns

A function making the function ‘predict’ work with either a 1D NumPy array or a 2D NumPy array. The evaluation will have the same dimension as the input data.

Return type

Callable[[numpy.ndarray], numpy.ndarray]

classmethod format_transform(transform_inputs=True, transform_outputs=True)

Force a function to transform its input and/or output variables.

Parameters
  • transform_inputs (bool) – If True, apply the transformers to the input variables.

  • transform_outputs (bool) – If True, apply the transformers to the output variables.

Returns

A function evaluating a function of interest, after transforming its input data and/or before transforming its output data.

Return type

Callable[[numpy.ndarray], numpy.ndarray]

classmethod transform_jacobian(predict_jac)

Apply transformation to inputs and inverse transformation to outputs.

Parameters

predict_jac (Callable[[numpy.ndarray], numpy.ndarray]) – The function of interest to be called.

Returns

A function evaluating the function ‘predict_jac’, after transforming its input data and/or before transforming its output data.

Return type

Callable[[numpy.ndarray], numpy.ndarray]

property input_data

The input data matrix.

property input_shape

The dimension of the input variables before applying the transformers.

property is_trained

Return whether the algorithm is trained.

learn(samples=None)

Train the machine learning algorithm from the learning dataset.

Parameters

samples (Optional[List[int]]) – The indices of the learning samples. If None, use the whole learning dataset.

Raises

NotImplementedError – If an output transformer modifies both the input and the output variables, e.g. PLS.

Return type

None

load_algo(directory)

Load a machine learning algorithm from a directory.

Parameters

directory (str) – The path to the directory where the machine learning algorithm is saved.

Return type

None

property output_data

The output data matrix.

property output_shape

The dimension of the output variables before applying the transformers.

predict(input_data, *args, **kwargs)

Evaluate ‘predict’ with either array or dictionary-based input data.

Firstly, the pre-processing stage converts the input data to a NumPy data array, if these data are expressed as a dictionary of NumPy data arrays.

Then, the processing evaluates the function ‘predict’ from this NumPy input data array.

Lastly, the post-processing transforms the output data to a dictionary of output NumPy data array if the input data were passed as a dictionary of NumPy data arrays.

Parameters
  • input_data (Union[numpy.ndarray, Dict[str, numpy.ndarray]]) – The input data.

  • *args – The positional arguments of the function ‘predict’.

  • **kwargs – The keyword arguments of the function ‘predict’.

Returns

The output data with the same type as the input one.

Return type

Union[numpy.ndarray, Dict[str, numpy.ndarray]]

predict_jacobian(input_data, *args, **kwargs)

Evaluate ‘predict_jac’ with either array or dictionary-based data.

Firstly, the pre-processing stage converts the input data to a NumPy data array, if these data are expressed as a dictionary of NumPy data arrays.

Then, the processing evaluates the function ‘predict_jac’ from this NumPy input data array.

Lastly, the post-processing transforms the output data to a dictionary of output NumPy data array if the input data were passed as a dictionary of NumPy data arrays.

Parameters
  • input_data – The input data.

  • *args – The positional arguments of the function ‘predict_jac’.

  • **kwargs – The keyword arguments of the function ‘predict_jac’.

Returns

The output data with the same type as the input one.

predict_raw(input_data)

Predict output data from input data.

Parameters

input_data (numpy.ndarray) – The input data with shape (n_samples, n_inputs).

Returns

The predicted output data with shape (n_samples, n_outputs).

Return type

numpy.ndarray

predict_std(input_data)[source]

Predict the standard deviation from input data.

Parameters

input_data (Union[numpy.ndarray, Dict[str, numpy.ndarray]]) – The input data with shape (n_samples, n_inputs).

Returns

The output data with shape (n_samples, n_outputs).

Return type

output_data

save(directory=None, path='.', save_learning_set=False)

Save the machine learning algorithm.

Parameters
  • directory (Optional[str]) – The name of the directory to save the algorithm.

  • path (str) – The path to parent directory where to create the directory.

  • save_learning_set (bool) – If False, do not save the learning set to lighten the saved files.

Returns

The path to the directory where the algorithm is saved.

Return type

str

Example