Note
Go to the end to download the full example code.
Plug a surrogate discipline in a Scenario#
In this section we describe the usage of surrogate model in GEMSEO,
which is implemented in the SurrogateDiscipline class.
A SurrogateDiscipline can be used to substitute a
Discipline within a Scenario. This
SurrogateDiscipline is an evaluation of the Discipline
and is faster to compute than the original discipline. It relies on a
BaseRegressor. This comes at the price of computing a DOE
on the original Discipline, and validating the approximation. The
computations from which the approximation is built can be available, or can be
built using GEMSEO' DOE capabilities.
See MDF-based DOE on the Sobieski SSBJ test case and
A from scratch example on the Sellar problem.
In GEMSEO's, the data used to build the surrogate model is taken from a
Dataset containing both inputs and outputs of the DOE. This
Dataset may have been generated by GEMSEO from a cache, using the
BaseFullCache.to_dataset() method,
from a database, using the OptimizationProblem.to_dataset() method,
or from a NumPy array or
a text file using the Dataset.from_array() and
Dataset.from_txt().
Then, the surrogate discipline can be used as any other discipline in a
MDOScenario, a DOEScenario, or a BaseMDA.
from __future__ import annotations
from numpy import array
from numpy import hstack
from numpy import vstack
from gemseo import create_discipline
from gemseo import create_scenario
from gemseo import create_surrogate
from gemseo import sample_disciplines
from gemseo.datasets.io_dataset import IODataset
from gemseo.problems.mdo.sobieski.core.design_space import SobieskiDesignSpace
Create a surrogate discipline#
Create the training dataset#
If you already have available data from a DOE produced externally,
it is possible to create a Dataset and Step 1 ends here.
For example, let us consider a synthetic dataset, with \(x\)
as input and \(y\) as output, described as a numpy
array. Then, we store these data in a Dataset:
variables = ["x", "y"]
sizes = {"x": 1, "y": 1}
groups = {"x": "inputs", "y": "outputs"}
data = vstack((
hstack((array([1.0]), array([1.0]))),
hstack((array([2.0]), array([2.0]))),
))
synthetic_dataset = IODataset.from_array(data, variables, sizes, groups)
If you do not have available data,the following paragraphs of Step 1 concern you.
Here, we illustrate the generation of the training data using a DOEScenario,
similarly to sobieski_doe, where more details are given.
In this basic example, an Discipline computing the mission
performance (range) in the SSBJ test case is
sampled with a DOEScenario. Then, the generated database is used to
build a SurrogateDiscipline.
But more complex scenarios can be used in the same way: complete optimization
processes or MDAs can be replaced by their surrogate counterparts. The right
cache or database shall then be used to build the
SurrogateDiscipline, but the main logic won't differ from this
example.
Firstly, we create the Discipline by means of the API function
create_discipline():
discipline = create_discipline("SobieskiMission")
Then, we read the DesignSpace of the Sobieski problem and keep only the inputs of the Sobieski Mission
"x_shared", "y_24", "y_34"
as inputs of the DOE:
design_space = SobieskiDesignSpace()
design_space = design_space.filter(["x_shared", "y_24", "y_34"])
From this Discipline and this DesignSpace,
we can generate 30 samples by means of the sample_disciplines() function
with the LHS algorithm:
mission_dataset = sample_disciplines(
[discipline], design_space, "y_4", algo_name="PYDOE_LHS", n_samples=30
)
INFO - 16:24:51: *** Start Sampling execution ***
INFO - 16:24:51: Sampling
INFO - 16:24:51: Disciplines: SobieskiMission
INFO - 16:24:51: MDO formulation: MDF
INFO - 16:24:51: Running the algorithm PYDOE_LHS:
INFO - 16:24:51: 3%|▎ | 1/30 [00:00<00:00, 399.38 it/sec]
INFO - 16:24:51: 7%|▋ | 2/30 [00:00<00:00, 690.93 it/sec]
INFO - 16:24:51: 10%|█ | 3/30 [00:00<00:00, 920.81 it/sec]
INFO - 16:24:51: 13%|█▎ | 4/30 [00:00<00:00, 1127.20 it/sec]
INFO - 16:24:51: 17%|█▋ | 5/30 [00:00<00:00, 1305.09 it/sec]
INFO - 16:24:51: 20%|██ | 6/30 [00:00<00:00, 1461.43 it/sec]
INFO - 16:24:51: 23%|██▎ | 7/30 [00:00<00:00, 1590.21 it/sec]
INFO - 16:24:51: 27%|██▋ | 8/30 [00:00<00:00, 1716.25 it/sec]
INFO - 16:24:51: 30%|███ | 9/30 [00:00<00:00, 1829.00 it/sec]
INFO - 16:24:51: 33%|███▎ | 10/30 [00:00<00:00, 1924.17 it/sec]
INFO - 16:24:51: 37%|███▋ | 11/30 [00:00<00:00, 2014.29 it/sec]
INFO - 16:24:51: 40%|████ | 12/30 [00:00<00:00, 2100.30 it/sec]
INFO - 16:24:51: 43%|████▎ | 13/30 [00:00<00:00, 2179.73 it/sec]
INFO - 16:24:51: 47%|████▋ | 14/30 [00:00<00:00, 2244.14 it/sec]
INFO - 16:24:51: 50%|█████ | 15/30 [00:00<00:00, 2308.79 it/sec]
INFO - 16:24:51: 53%|█████▎ | 16/30 [00:00<00:00, 2369.66 it/sec]
INFO - 16:24:51: 57%|█████▋ | 17/30 [00:00<00:00, 2428.17 it/sec]
INFO - 16:24:51: 60%|██████ | 18/30 [00:00<00:00, 2476.14 it/sec]
INFO - 16:24:51: 63%|██████▎ | 19/30 [00:00<00:00, 2521.73 it/sec]
INFO - 16:24:51: 67%|██████▋ | 20/30 [00:00<00:00, 2564.15 it/sec]
INFO - 16:24:51: 70%|███████ | 21/30 [00:00<00:00, 2607.70 it/sec]
INFO - 16:24:51: 73%|███████▎ | 22/30 [00:00<00:00, 2641.25 it/sec]
INFO - 16:24:51: 77%|███████▋ | 23/30 [00:00<00:00, 2642.70 it/sec]
INFO - 16:24:51: 80%|████████ | 24/30 [00:00<00:00, 2672.88 it/sec]
INFO - 16:24:51: 83%|████████▎ | 25/30 [00:00<00:00, 2699.32 it/sec]
INFO - 16:24:51: 87%|████████▋ | 26/30 [00:00<00:00, 2731.15 it/sec]
INFO - 16:24:51: 90%|█████████ | 27/30 [00:00<00:00, 2762.37 it/sec]
INFO - 16:24:51: 93%|█████████▎| 28/30 [00:00<00:00, 2792.15 it/sec]
INFO - 16:24:51: 97%|█████████▋| 29/30 [00:00<00:00, 2816.34 it/sec]
INFO - 16:24:51: 100%|██████████| 30/30 [00:00<00:00, 2807.81 it/sec]
INFO - 16:24:51: *** End Sampling execution ***
See also
In this tutorial, the DOE is based on pyDOE, however, several other designs are
available, based on the package or OpenTURNS. Some examples of these designs are plotted
in DOE algorithms. To list the available DOE algorithms in the
current GEMSEO configuration, use
gemseo.get_available_doe_algorithms().
Create the SurrogateDiscipline#
From this Dataset, we can build a SurrogateDiscipline
of the Discipline.
Indeed, by means of the API function create_surrogate,
we create the SurrogateDiscipline from the dataset,
which can be executed as any other discipline.
Precisely,
by means of the API function create_surrogate(),
we create a SurrogateDiscipline relying on a LinearRegressor
and inheriting from Discipline:
synthetic_surrogate = create_surrogate("LinearRegressor", synthetic_dataset)
See also
Note that a subset of the inputs and outputs to be used to build the
SurrogateDiscipline may be specified by the user if needed,
mainly to avoid unnecessary computations.
Then, we execute it as any Discipline:
input_data = {"x": array([2.0])}
out = synthetic_surrogate.execute(input_data)
out["y"]
array([2.])
In our study case, from the DOE built at Step 1,
we build a RBFRegressor of \(y_4\)
representing the range in function of L/D:
range_surrogate = create_surrogate("RBFRegressor", mission_dataset)
Use the SurrogateDiscipline in MDO#
The obtained SurrogateDiscipline can be used in any
Scenario, such as a DOEScenario or MDOScenario.
We see here that the Discipline.execute() method can be used as in
any other discipline to compute the outputs for given inputs:
for i in range(5):
lod = i * 2.0
y_4_pred = range_surrogate.execute({"y_24": array([lod])})["y_4"]
print(f"Surrogate range (L/D = {lod}) = {y_4_pred}")
WARNING - 16:24:51: The surrogate discipline RBF_Sampling is used at an input point outside its domain of validity: {'x_shared': array([5.01344390e-02, 4.50436913e+04, 1.59844800e+00, 5.52176947e+00,
WARNING - 16:24:51: 5.49342194e+01, 1.00014906e+03]), 'y_24': array([0.]), 'y_34': array([1.20977213])}.
Surrogate range (L/D = 0.0) = [-97.86844673]
Surrogate range (L/D = 2.0) = [184.60105962]
Surrogate range (L/D = 4.0) = [505.37518268]
Surrogate range (L/D = 6.0) = [840.33241658]
Surrogate range (L/D = 8.0) = [1161.49215263]
And we can build and execute an optimization scenario from it.
The design variables are "y_24". The Jacobian matrix is computed by finite
differences by default for surrogates, except for the
SurrogateDiscipline relying on LinearRegressor which has
an analytical (and constant) Jacobian.
design_space = design_space.filter(["y_24"])
scenario = create_scenario(
range_surrogate,
"y_4",
design_space,
formulation_name="DisciplinaryOpt",
maximize_objective=True,
)
scenario.execute(algo_name="L-BFGS-B", max_iter=30)
INFO - 16:24:51: *** Start MDOScenario execution ***
INFO - 16:24:51: MDOScenario
INFO - 16:24:51: Disciplines: RBF_Sampling
INFO - 16:24:51: MDO formulation: DisciplinaryOpt
INFO - 16:24:51: Optimization problem:
INFO - 16:24:51: minimize -y_4(y_24)
INFO - 16:24:51: with respect to y_24
INFO - 16:24:51: over the design space:
INFO - 16:24:51: +------+-------------+--------------------+-------------+-------+
INFO - 16:24:51: | Name | Lower bound | Value | Upper bound | Type |
INFO - 16:24:51: +------+-------------+--------------------+-------------+-------+
INFO - 16:24:51: | y_24 | 0.44 | 0.8060924457095278 | 11.13 | float |
INFO - 16:24:51: +------+-------------+--------------------+-------------+-------+
INFO - 16:24:51: Solving optimization problem with algorithm L-BFGS-B:
INFO - 16:24:51: 3%|▎ | 1/30 [00:00<00:00, 326.86 it/sec, feas=True, obj=-10.3]
WARNING - 16:24:51: The surrogate discipline RBF_Sampling is used at an input point outside its domain of validity: {'x_shared': array([5.01344390e-02, 4.50436913e+04, 1.59844800e+00, 5.52176947e+00,
WARNING - 16:24:51: 5.49342194e+01, 1.00014906e+03]), 'y_24': array([11.13]), 'y_34': array([1.20977213])}.
WARNING - 16:24:51: The surrogate discipline RBF_Sampling is used at an input point outside its domain of validity: {'x_shared': array([5.01344390e-02, 4.50436913e+04, 1.59844800e+00, 5.52176947e+00,
WARNING - 16:24:51: 5.49342194e+01, 1.00014906e+03]), 'y_24': array([11.13]), 'y_34': array([1.20977213])}.
INFO - 16:24:51: 7%|▋ | 2/30 [00:00<00:00, 289.70 it/sec, feas=True, obj=-1.59e+3]
INFO - 16:24:51: Optimization result:
INFO - 16:24:51: Optimizer info:
INFO - 16:24:51: Status: 0
INFO - 16:24:51: Message: CONVERGENCE: NORM OF PROJECTED GRADIENT <= PGTOL
INFO - 16:24:51: Solution:
INFO - 16:24:51: Objective: -1589.7138353791026
INFO - 16:24:51: Design space:
INFO - 16:24:51: +------+-------------+-------+-------------+-------+
INFO - 16:24:51: | Name | Lower bound | Value | Upper bound | Type |
INFO - 16:24:51: +------+-------------+-------+-------------+-------+
INFO - 16:24:51: | y_24 | 0.44 | 11.13 | 11.13 | float |
INFO - 16:24:51: +------+-------------+-------+-------------+-------+
INFO - 16:24:51: *** End MDOScenario execution ***
Available surrogate models#
Currently, the following surrogate models are available:
Linear regression, based on the Scikit-learn library, for that use the
LinearRegressorclass.Polynomial regression, based on the Scikit-learn library, for that use the
PolynomialRegressorclass,Gaussian processes (also known as Kriging), based on the Scikit-learn library, for that use the
GaussianProcessRegressorclass,Mixture of experts, for that use the
MOERegressorclass,Random forest models, based on the Scikit-learn # library, for that use the
RandomForestRegressorclass.RBF models (Radial Basis Functions), using the SciPy library, for that use the
RBFRegressorclass.PCE models (Polynomial Chaos Expansion), based on the OpenTURNS library, for that use the
PCERegressorclass.
To understand the detailed behavior of the models, please go to the documentation of the used packages.
Extending surrogate models#
All surrogate models work the same way: the BaseRegressor base
class shall be extended. See Extend GEMSEO features to learn how to run
GEMSEO
with external Python modules. Then, the RegressorFactory can
build the new BaseRegressor automatically from its regression
algorithm name and options. This factory is called by the constructor of
SurrogateDiscipline.
See also
More generally, GEMSEO provides extension mechanisms to integrate external :DOE and optimization algorithms, disciplines, MDAs and surrogate models.
Total running time of the script: (0 minutes 0.063 seconds)