Note

Click here to download the full example code

Plug a surrogate discipline in a Scenario¶

In this section we describe the usage of surrogate model in GEMSEO, which is implemented in the SurrogateDiscipline class.

A SurrogateDiscipline can be used to substitute a MDODiscipline within a Scenario. This SurrogateDiscipline is an evaluation of the MDODiscipline and is faster to compute than the original discipline. It relies on a MLRegressionAlgo. This comes at the price of computing a DOE on the original MDODiscipline, and validating the approximation. The computations from which the approximation is built can be available, or can be built using GEMSEO’ DOE capabilities. See Tutorial: How to carry out a trade-off study and Tutorial: How to solve a MDO problem.

In GEMSEO’s, the data used to build the surrogate model is taken from a Dataset containing both inputs and outputs of the DOE. This Dataset may have been generated by GEMSEO from a cache, using the AbstractFullCache.export_to_dataset() method, or, from a numpy array or a text file using the Dataset.set_from_array() and Dataset.set_from_file().

Then, the surrogate discipline can be used as any other discipline in a MDOScenario, a DOEScenario, or a MDA.

from __future__ import absolute_import, division, print_function, unicode_literals

from future import standard_library
from numpy import array, hstack, vstack

from gemseo.api import (
    configure_logger,
    create_discipline,
    create_scenario,
    create_surrogate,
)
from gemseo.core.dataset import Dataset
from gemseo.problems.sobieski.core import SobieskiProblem

configure_logger()

standard_library.install_aliases()

Create a surrogate discipline¶

Create the learning dataset¶

If you already have available data from a DOE produced externally, it is possible to create a Dataset and Step 1 ends here. For example, let us consider a synthetic dataset, with \(x\) as input and \(y\) as output, described as a numpy array. Then, we store these data in a Dataset:

variables = ["x", "y"]
sizes = {"x": 1, "y": 1}
groups = {"x": "inputs", "y": "outputs"}
data = vstack(
    (
        hstack((array([1.0]), array([1.0]))),
        hstack((array([2.0]), array([2.0]))),
    )
)
synthetic_dataset = Dataset()
synthetic_dataset.set_from_array(data, variables, sizes, groups)

If you do not have available data,the following paragraphs of Step 1 concern you.

Here, we illustrate the generation of the training data using a DOEScenario, similarly to Tutorial: How to carry out a trade-off study, where more details are given.

In this basic example, a MDODiscipline computing the mission performance (range) in the SSBJ test case is sampled with a DOEScenario. Then, the generated database is used to build a SurrogateDiscipline.

But more complex scenarios can be used in the same way: complete optimization processes or MDAs can be replaced by their surrogate counterparts. The right HDF cache shall then be used to build the SurrogateDiscipline, but the main logic won’t differ from this example.

Firstly, we create the MDODiscipline by means of the API function create_discipline() and cache the evaluations in memory, using the MDODiscipline.set_cache_policy() method:

discipline = create_discipline("SobieskiMission")
discipline.set_cache_policy(cache_type=discipline.MEMORY_FULL_CACHE)

Then, we read the DesignSpace of the Sobieski problem and keep only the inputs of the Sobieski Mission “x_shared”, “y_24”, “y_34” as inputs of the DOE:

design_space = SobieskiProblem().read_design_space()
design_space = design_space.filter(["x_shared", "y_24", "y_34"])

From this MDODiscipline and this DesignSpace, we build a DOEScenario by means of the API function create_scenario():

scenario = create_scenario(
    [discipline],
    "DisciplinaryOpt",
    objective_name="y_4",
    design_space=design_space,
    scenario_type="DOE",
)

Lastly, we execute the process with the LHS algorithm and 30 samples.

scenario.execute({"n_samples": 30, "algo": "lhs"})
mission_dataset = discipline.cache.export_to_dataset()

Create the `SurrogateDiscipline`¶

From this Dataset, we can build a SurrogateDiscipline of the MDODiscipline.

Indeed, by means of the API function create_surrogate, we create the SurrogateDiscipline from the cache file, which can be executed as any other discipline.

Precisely, by means of the API function create_surrogate(), we create a SurrogateDiscipline relying on a LinearRegression and inheriting from MDODiscipline:

synthetic_surrogate = create_surrogate("LinearRegression", synthetic_dataset)

Use the `SurrogateDiscipline` in MDO¶

The obtained SurrogateDiscipline can be used in any Scenario, such as a DOEScenario or MDOScenario. We see here that the MDODiscipline.execute() method can be used as in any other discipline to compute the outputs for given inputs:

for i in range(5):
    lod = i * 2.0
    y_4_pred = range_surrogate.execute({"y_24": array([lod])})["y_4"]
    print("Surrogate range (L/D = {}) = {}".format(lod, y_4_pred))

Out:

Surrogate range (L/D = 0.0) = [1314.29536849]
Surrogate range (L/D = 2.0) = [1314.82798458]
Surrogate range (L/D = 4.0) = [1315.16562826]
Surrogate range (L/D = 6.0) = [1315.27407977]
Surrogate range (L/D = 8.0) = [1315.14203899]

And we can build and execute an optimization scenario from it. The design variables are “y_24”. The Jacobian matrix is computed by finite differences by default for surrogates, except for the SurrogateDiscipline relying on LinearRegression which has an analytical (and constant) Jacobian.

design_space = design_space.filter(["y_24"])
scenario = create_scenario(
    range_surrogate,
    formulation="DisciplinaryOpt",
    objective_name="y_4",
    design_space=design_space,
    scenario_type="MDO",
    maximize_objective=True,
)
scenario.execute({"max_iter": 30, "algo": "L-BFGS-B"})

Out:

{'max_iter': 30, 'algo': 'L-BFGS-B'}

Available surrogate models¶

Currently, the following surrogate models are available:

Linear regression, based on the Scikit-learn library, for that use the LinearRegression class.
Polynomial regression, based on the Scikit-learn library, for that use the PolynomialRegression class,
Gaussian processes (also known as Kriging), based on the Scikit-learn library, for that use the GaussianProcessRegression class,
Mixture of experts, for that use the MixtureOfExperts class,
Random forest models, based on the Scikit-learn # library, for that use the RandomForestRegressor class.
RBF models (Radial Basis Functions), using the SciPy library, for that use the RBFRegression class.
PCE models (Polynomial Chaos Expansion), based on the OpenTURNS library, for that use the PCERegression class.

To understand the detailed behavior of the models, please go to the documentation of the used packages.

Extending surrogate models ————————–

All surrogate models work the same way: the MLRegressionAlgo base class shall be extended. See Extending GEMSEO to learn how to run GEMSEO with external python modules. Then, the RegressionModelFactory can build the new MLRegressionAlgo automatically from its regression algorithm name and options. This factory is called by the constructor of SurrogateDiscipline.

Plug a surrogate discipline in a Scenario¶

Create a surrogate discipline¶

Create the learning dataset¶

Create the SurrogateDiscipline¶

Use the SurrogateDiscipline in MDO¶

Available surrogate models¶

Create the `SurrogateDiscipline`¶

Use the `SurrogateDiscipline` in MDO¶