Note
Click here to download the full example code
Plug a surrogate discipline in a Scenario¶
In this section we describe the usage of surrogate model in GEMSEO,
which is implemented in the SurrogateDiscipline
class.
A SurrogateDiscipline
can be used to substitute a
MDODiscipline
within a Scenario
. This
SurrogateDiscipline
is an evaluation of the MDODiscipline
and is faster to compute than the original discipline. It relies on a
MLRegressionAlgo
. This comes at the price of computing a DOE
on the original MDODiscipline
, and validating the approximation. The
computations from which the approximation is built can be available, or can be
built using GEMSEO’ DOE capabilities. See Tutorial: How to carry out a trade-off study and
Tutorial: How to solve an MDO problem.
In GEMSEO’s, the data used to build the surrogate model is taken from a
Dataset
containing both inputs and outputs of the DOE. This
Dataset
may have been generated by GEMSEO from a cache, using the
AbstractFullCache.export_to_dataset()
method, or, from a numpy array or
a text file using the Dataset.set_from_array()
and
Dataset.set_from_file()
.
Then, the surrogate discipline can be used as any other discipline in a
MDOScenario
, a DOEScenario
, or a MDA
.
from __future__ import division, unicode_literals
from numpy import array, hstack, vstack
from gemseo.api import (
configure_logger,
create_discipline,
create_scenario,
create_surrogate,
)
from gemseo.core.dataset import Dataset
from gemseo.problems.sobieski.core import SobieskiProblem
configure_logger()
Out:
<RootLogger root (INFO)>
Create a surrogate discipline¶
Create the learning dataset¶
If you already have available data from a DOE produced externally,
it is possible to create a Dataset
and Step 1 ends here.
For example, let us consider a synthetic dataset, with \(x\)
as input and \(y\) as output, described as a numpy
array. Then, we store these data in a Dataset
:
variables = ["x", "y"]
sizes = {"x": 1, "y": 1}
groups = {"x": "inputs", "y": "outputs"}
data = vstack(
(
hstack((array([1.0]), array([1.0]))),
hstack((array([2.0]), array([2.0]))),
)
)
synthetic_dataset = Dataset()
synthetic_dataset.set_from_array(data, variables, sizes, groups)
If you do not have available data,the following paragraphs of Step 1 concern you.
Here, we illustrate the generation of the training data using a DOEScenario
,
similarly to Tutorial: How to carry out a trade-off study, where more details are given.
In this basic example, an MDODiscipline
computing the mission
performance (range) in the SSBJ test case is
sampled with a DOEScenario
. Then, the generated database is used to
build a SurrogateDiscipline
.
But more complex scenarios can be used in the same way: complete optimization
processes or MDAs can be replaced by their surrogate counterparts. The right
HDF cache shall then be used to build the
SurrogateDiscipline
, but the main logic won’t differ from this
example.
Firstly, we create the MDODiscipline
by means of the API function
create_discipline()
and cache the evaluations in memory,
using the MDODiscipline.set_cache_policy()
method:
discipline = create_discipline("SobieskiMission")
discipline.set_cache_policy(cache_type=discipline.MEMORY_FULL_CACHE)
Then, we read the DesignSpace
of the Sobieski problem and keep only the inputs of the Sobieski Mission
“x_shared”, “y_24”, “y_34”
as inputs of the DOE:
design_space = SobieskiProblem().read_design_space()
design_space = design_space.filter(["x_shared", "y_24", "y_34"])
From this MDODiscipline
and this DesignSpace
,
we build a DOEScenario
by means of the API function create_scenario()
:
scenario = create_scenario(
[discipline],
"DisciplinaryOpt",
objective_name="y_4",
design_space=design_space,
scenario_type="DOE",
)
Lastly, we execute the process with the LHS algorithm and 30 samples.
scenario.execute({"n_samples": 30, "algo": "lhs"})
mission_dataset = discipline.cache.export_to_dataset(
inputs_names=["x_shared", "y_24", "y_34"]
)
Out:
INFO - 14:41:34:
INFO - 14:41:34: *** Start DOE Scenario execution ***
INFO - 14:41:34: DOEScenario
INFO - 14:41:34: Disciplines: SobieskiMission
INFO - 14:41:34: MDOFormulation: DisciplinaryOpt
INFO - 14:41:34: Algorithm: lhs
INFO - 14:41:34: Optimization problem:
INFO - 14:41:34: Minimize: y_4(x_shared, y_24, y_34)
INFO - 14:41:34: With respect to: x_shared, y_24, y_34
INFO - 14:41:34: DOE sampling: 0%| | 0/30 [00:00<?, ?it]
INFO - 14:41:34: DOE sampling: 100%|██████████| 30/30 [00:00<00:00, 640.74 it/sec, obj=1.27e+3]
INFO - 14:41:34: Optimization result:
INFO - 14:41:34: Objective value = 71.16601799429675
INFO - 14:41:34: The result is feasible.
INFO - 14:41:34: Status: None
INFO - 14:41:34: Optimizer message: None
INFO - 14:41:34: Number of calls to the objective function by the optimizer: 30
INFO - 14:41:34: Design space:
INFO - 14:41:34: +----------+-------------+---------------------+-------------+-------+
INFO - 14:41:34: | name | lower_bound | value | upper_bound | type |
INFO - 14:41:34: +----------+-------------+---------------------+-------------+-------+
INFO - 14:41:34: | x_shared | 0.01 | 0.04440901205483268 | 0.09 | float |
INFO - 14:41:34: | x_shared | 30000 | 58940.10748233336 | 60000 | float |
INFO - 14:41:34: | x_shared | 1.4 | 1.441133922818264 | 1.8 | float |
INFO - 14:41:34: | x_shared | 2.5 | 5.893919149663935 | 8.5 | float |
INFO - 14:41:34: | x_shared | 40 | 58.55971698205414 | 70 | float |
INFO - 14:41:34: | x_shared | 500 | 598.9420525239799 | 1500 | float |
INFO - 14:41:34: | y_24 | 0.44 | 0.8060924457095278 | 11.13 | float |
INFO - 14:41:34: | y_34 | 0.44 | 1.458803878476488 | 1.98 | float |
INFO - 14:41:34: +----------+-------------+---------------------+-------------+-------+
INFO - 14:41:34: *** DOE Scenario run terminated ***
See also
In this tutorial, the DOE is based on pyDOE, however, several other designs are
available, based on the package or OpenTURNS. Some examples of these designs are plotted
in DOE algorithms. To list the available DOE algorithms in the
current GEMSEO configuration, use
gemseo.api.get_available_doe_algorithms()
.
Create the SurrogateDiscipline
¶
From this Dataset
, we can build a SurrogateDiscipline
of the MDODiscipline
.
Indeed, by means of the API function create_surrogate
,
we create the SurrogateDiscipline
from the cache file,
which can be executed as any other discipline.
Precisely,
by means of the API function create_surrogate()
,
we create a SurrogateDiscipline
relying on a LinearRegression
and inheriting from MDODiscipline
:
synthetic_surrogate = create_surrogate("LinearRegression", synthetic_dataset)
Out:
/home/docs/checkouts/readthedocs.org/user_builds/gemseo/conda/3.2.2/lib/python3.8/site-packages/sklearn/linear_model/_base.py:148: FutureWarning: 'normalize' was deprecated in version 1.0 and will be removed in 1.2. Please leave the normalize parameter to its default value to silence this warning. The default behavior of this estimator is to not do any normalization. If normalization is needed please use sklearn.preprocessing.StandardScaler instead.
warnings.warn(
INFO - 14:41:34: Build the surrogate discipline: LinReg_Dataset
INFO - 14:41:34: Dataset name: Dataset
INFO - 14:41:34: Dataset size: 2
INFO - 14:41:34: Surrogate model: LinearRegression
INFO - 14:41:34: Use the surrogate discipline: LinReg_Dataset
INFO - 14:41:34: Inputs: x
INFO - 14:41:34: Outputs: y
INFO - 14:41:34: Jacobian: use surrogate model jacobian
See also
Note that a subset of the inputs and outputs to be used to build the
SurrogateDiscipline
may be specified by the user if needed,
mainly to avoid unnecessary computations.
Then, we execute it as any MDODiscipline
:
input_data = {"x": array([2.0])}
out = synthetic_surrogate.execute(input_data)
print(out["y"])
Out:
[2.]
In our study case, from the DOE built at Step 1,
we build a RBFRegression
of \(y_4\)
representing the range in function of L/D:
range_surrogate = create_surrogate("RBFRegression", mission_dataset)
Out:
INFO - 14:41:34: Build the surrogate discipline: RBF_SobieskiMission
INFO - 14:41:34: Dataset name: SobieskiMission
INFO - 14:41:34: Dataset size: 30
INFO - 14:41:34: Surrogate model: RBFRegression
INFO - 14:41:34: Use the surrogate discipline: RBF_SobieskiMission
INFO - 14:41:34: Inputs: x_shared, y_24, y_34
INFO - 14:41:34: Outputs: y_4
INFO - 14:41:34: Jacobian: use surrogate model jacobian
Use the SurrogateDiscipline
in MDO¶
The obtained SurrogateDiscipline
can be used in any
Scenario
, such as a DOEScenario
or MDOScenario
.
We see here that the MDODiscipline.execute()
method can be used as in
any other discipline to compute the outputs for given inputs:
for i in range(5):
lod = i * 2.0
y_4_pred = range_surrogate.execute({"y_24": array([lod])})["y_4"]
print("Surrogate range (L/D = {}) = {}".format(lod, y_4_pred))
Out:
Surrogate range (L/D = 0.0) = [-97.86844673]
Surrogate range (L/D = 2.0) = [184.60105962]
Surrogate range (L/D = 4.0) = [505.37518268]
Surrogate range (L/D = 6.0) = [840.33241658]
Surrogate range (L/D = 8.0) = [1161.49215263]
And we can build and execute an optimization scenario from it.
The design variables are “y_24”. The Jacobian matrix is computed by finite
differences by default for surrogates, except for the
SurrogateDiscipline
relying on LinearRegression
which has
an analytical (and constant) Jacobian.
design_space = design_space.filter(["y_24"])
scenario = create_scenario(
range_surrogate,
formulation="DisciplinaryOpt",
objective_name="y_4",
design_space=design_space,
scenario_type="MDO",
maximize_objective=True,
)
scenario.execute({"max_iter": 30, "algo": "L-BFGS-B"})
Out:
INFO - 14:41:34:
INFO - 14:41:34: *** Start MDO Scenario execution ***
INFO - 14:41:34: MDOScenario
INFO - 14:41:34: Disciplines: Surrogate discipline: RBF_SobieskiMission
INFO - 14:41:34: Dataset name: SobieskiMission
INFO - 14:41:34: Dataset size: 30
INFO - 14:41:34: Surrogate model: RBFRegression
INFO - 14:41:34: Inputs: x_shared, y_24, y_34
INFO - 14:41:34: Outputs: y_4
INFO - 14:41:34: MDOFormulation: DisciplinaryOpt
INFO - 14:41:34: Algorithm: L-BFGS-B
INFO - 14:41:34: Optimization problem:
INFO - 14:41:34: Minimize: -y_4(y_24)
INFO - 14:41:34: With respect to: y_24
INFO - 14:41:34: Design space:
INFO - 14:41:34: +------+-------------+--------------------+-------------+-------+
INFO - 14:41:34: | name | lower_bound | value | upper_bound | type |
INFO - 14:41:34: +------+-------------+--------------------+-------------+-------+
INFO - 14:41:34: | y_24 | 0.44 | 0.8060924457095278 | 11.13 | float |
INFO - 14:41:34: +------+-------------+--------------------+-------------+-------+
INFO - 14:41:34: Optimization: 0%| | 0/30 [00:00<?, ?it]
INFO - 14:41:34: Optimization: 7%|▋ | 2/30 [00:00<00:00, 4182.45 it/sec, obj=1.59e+3]
INFO - 14:41:34: Optimization result:
INFO - 14:41:34: Objective value = 1589.7138353791045
INFO - 14:41:34: The result is feasible.
INFO - 14:41:34: Status: 0
INFO - 14:41:34: Optimizer message: CONVERGENCE: NORM_OF_PROJECTED_GRADIENT_<=_PGTOL
INFO - 14:41:34: Number of calls to the objective function by the optimizer: 3
INFO - 14:41:34: Design space:
INFO - 14:41:34: +------+-------------+-------+-------------+-------+
INFO - 14:41:34: | name | lower_bound | value | upper_bound | type |
INFO - 14:41:34: +------+-------------+-------+-------------+-------+
INFO - 14:41:34: | y_24 | 0.44 | 11.13 | 11.13 | float |
INFO - 14:41:34: +------+-------------+-------+-------------+-------+
INFO - 14:41:34: *** MDO Scenario run terminated in 0:00:00.018002 ***
{'max_iter': 30, 'algo': 'L-BFGS-B'}
Available surrogate models¶
Currently, the following surrogate models are available:
Linear regression, based on the Scikit-learn library, for that use the
LinearRegression
class.Polynomial regression, based on the Scikit-learn library, for that use the
PolynomialRegression
class,Gaussian processes (also known as Kriging), based on the Scikit-learn library, for that use the
GaussianProcessRegression
class,Mixture of experts, for that use the
MixtureOfExperts
class,Random forest models, based on the Scikit-learn # library, for that use the
RandomForestRegressor
class.RBF models (Radial Basis Functions), using the SciPy library, for that use the
RBFRegression
class.PCE models (Polynomial Chaos Expansion), based on the OpenTURNS library, for that use the
PCERegression
class.
To understand the detailed behavior of the models, please go to the documentation of the used packages.
Extending surrogate models ————————–
All surrogate models work the same way: the MLRegressionAlgo
base
class shall be extended. See Extend GEMSEO features to learn how to run
GEMSEO
with external Python modules. Then, the RegressionModelFactory
can
build the new MLRegressionAlgo
automatically from its regression
algorithm name and options. This factory is called by the constructor of
SurrogateDiscipline
.
See also
More generally, GEMSEO provides extension mechanisms to integrate external :DOE and optimization algorithms, disciplines, MDAs and surrogate models.
Total running time of the script: ( 0 minutes 0.161 seconds)