Plug a surrogate discipline in a Scenario

In this section we describe the usage of surrogate model in GEMSEO, which is implemented in the SurrogateDiscipline class.

A SurrogateDiscipline can be used to substitute a MDODiscipline within a Scenario. This SurrogateDiscipline is an evaluation of the MDODiscipline and is faster to compute than the original discipline. It relies on a MLRegressionAlgo. This comes at the price of computing a DOE on the original MDODiscipline, and validating the approximation. The computations from which the approximation is built can be available, or can be built using GEMSEO’ DOE capabilities. See Tutorial: How to carry out a trade-off study and Tutorial: How to solve an MDO problem.

In GEMSEO’s, the data used to build the surrogate model is taken from a Dataset containing both inputs and outputs of the DOE. This Dataset may have been generated by GEMSEO from a cache, using the AbstractFullCache.export_to_dataset() method, from a database, using the OptimizationProblem.export_to_dataset() method, or from a NumPy array or a text file using the Dataset.set_from_array() and Dataset.set_from_file().

Then, the surrogate discipline can be used as any other discipline in a MDOScenario, a DOEScenario, or a MDA.

from gemseo.api import configure_logger
from gemseo.api import create_discipline
from gemseo.api import create_scenario
from gemseo.api import create_surrogate
from gemseo.core.dataset import Dataset
from gemseo.problems.sobieski.core.problem import SobieskiProblem
from numpy import array
from numpy import hstack
from numpy import vstack

configure_logger()
<RootLogger root (INFO)>

Create a surrogate discipline

Create the learning dataset

If you already have available data from a DOE produced externally, it is possible to create a Dataset and Step 1 ends here. For example, let us consider a synthetic dataset, with \(x\) as input and \(y\) as output, described as a numpy array. Then, we store these data in a Dataset:

variables = ["x", "y"]
sizes = {"x": 1, "y": 1}
groups = {"x": "inputs", "y": "outputs"}
data = vstack(
    (
        hstack((array([1.0]), array([1.0]))),
        hstack((array([2.0]), array([2.0]))),
    )
)
synthetic_dataset = Dataset()
synthetic_dataset.set_from_array(data, variables, sizes, groups)

If you do not have available data,the following paragraphs of Step 1 concern you.

Here, we illustrate the generation of the training data using a DOEScenario, similarly to Tutorial: How to carry out a trade-off study, where more details are given.

In this basic example, an MDODiscipline computing the mission performance (range) in the SSBJ test case is sampled with a DOEScenario. Then, the generated database is used to build a SurrogateDiscipline.

But more complex scenarios can be used in the same way: complete optimization processes or MDAs can be replaced by their surrogate counterparts. The right cache or database shall then be used to build the SurrogateDiscipline, but the main logic won’t differ from this example.

Firstly, we create the MDODiscipline by means of the API function create_discipline():

discipline = create_discipline("SobieskiMission")

Then, we read the DesignSpace of the Sobieski problem and keep only the inputs of the Sobieski Mission “x_shared”, “y_24”, “y_34” as inputs of the DOE:

design_space = SobieskiProblem().design_space
design_space = design_space.filter(["x_shared", "y_24", "y_34"])

From this MDODiscipline and this DesignSpace, we build a DOEScenario by means of the API function create_scenario():

scenario = create_scenario(
    [discipline],
    "DisciplinaryOpt",
    objective_name="y_4",
    design_space=design_space,
    scenario_type="DOE",
)

Lastly, we execute the process with the LHS algorithm and 30 samples.

scenario.execute({"n_samples": 30, "algo": "lhs"})
mission_dataset = scenario.export_to_dataset(opt_naming=False)
INFO - 11:58:19:
INFO - 11:58:19: *** Start DOEScenario execution ***
INFO - 11:58:19: DOEScenario
INFO - 11:58:19:    Disciplines: SobieskiMission
INFO - 11:58:19:    MDO formulation: DisciplinaryOpt
INFO - 11:58:19: Optimization problem:
INFO - 11:58:19:    minimize y_4(x_shared, y_24, y_34)
INFO - 11:58:19:    with respect to x_shared, y_24, y_34
INFO - 11:58:19:    over the design space:
INFO - 11:58:19:    +----------+-------------+------------+-------------+-------+
INFO - 11:58:19:    | name     | lower_bound |   value    | upper_bound | type  |
INFO - 11:58:19:    +----------+-------------+------------+-------------+-------+
INFO - 11:58:19:    | x_shared |     0.01    |    0.05    |     0.09    | float |
INFO - 11:58:19:    | x_shared |    30000    |   45000    |    60000    | float |
INFO - 11:58:19:    | x_shared |     1.4     |    1.6     |     1.8     | float |
INFO - 11:58:19:    | x_shared |     2.5     |    5.5     |     8.5     | float |
INFO - 11:58:19:    | x_shared |      40     |     55     |      70     | float |
INFO - 11:58:19:    | x_shared |     500     |    1000    |     1500    | float |
INFO - 11:58:19:    | y_24     |     0.44    | 4.15006276 |    11.13    | float |
INFO - 11:58:19:    | y_34     |     0.44    | 1.10754577 |     1.98    | float |
INFO - 11:58:19:    +----------+-------------+------------+-------------+-------+
INFO - 11:58:19: Solving optimization problem with algorithm lhs:
INFO - 11:58:19: ...   0%|          | 0/30 [00:00<?, ?it]
INFO - 11:58:19: ... 100%|██████████| 30/30 [00:00<00:00, 1974.69 it/sec, obj=1.27e+3]
INFO - 11:58:19: Optimization result:
INFO - 11:58:19:    Optimizer info:
INFO - 11:58:19:       Status: None
INFO - 11:58:19:       Message: None
INFO - 11:58:19:       Number of calls to the objective function by the optimizer: 30
INFO - 11:58:19:    Solution:
INFO - 11:58:19:       Objective: 71.16601799429675
INFO - 11:58:19:       Design space:
INFO - 11:58:19:       +----------+-------------+---------------------+-------------+-------+
INFO - 11:58:19:       | name     | lower_bound |        value        | upper_bound | type  |
INFO - 11:58:19:       +----------+-------------+---------------------+-------------+-------+
INFO - 11:58:19:       | x_shared |     0.01    | 0.04440901205483268 |     0.09    | float |
INFO - 11:58:19:       | x_shared |    30000    |  58940.10748233336  |    60000    | float |
INFO - 11:58:19:       | x_shared |     1.4     |  1.441133922818264  |     1.8     | float |
INFO - 11:58:19:       | x_shared |     2.5     |  5.893919149663935  |     8.5     | float |
INFO - 11:58:19:       | x_shared |      40     |  58.55971698205414  |      70     | float |
INFO - 11:58:19:       | x_shared |     500     |  598.9420525239799  |     1500    | float |
INFO - 11:58:19:       | y_24     |     0.44    |  0.8060924457095278 |    11.13    | float |
INFO - 11:58:19:       | y_34     |     0.44    |  1.458803878476488  |     1.98    | float |
INFO - 11:58:19:       +----------+-------------+---------------------+-------------+-------+
INFO - 11:58:19: *** End DOEScenario execution (time: 0:00:00.027026) ***

See also

In this tutorial, the DOE is based on pyDOE, however, several other designs are available, based on the package or OpenTURNS. Some examples of these designs are plotted in DOE algorithms. To list the available DOE algorithms in the current GEMSEO configuration, use gemseo.api.get_available_doe_algorithms().

Create the SurrogateDiscipline

From this Dataset, we can build a SurrogateDiscipline of the MDODiscipline.

Indeed, by means of the API function create_surrogate, we create the SurrogateDiscipline from the dataset, which can be executed as any other discipline.

Precisely, by means of the API function create_surrogate(), we create a SurrogateDiscipline relying on a LinearRegressor and inheriting from MDODiscipline:

synthetic_surrogate = create_surrogate("LinearRegressor", synthetic_dataset)
INFO - 11:58:19: Build the surrogate discipline: LinReg_Dataset
INFO - 11:58:19:    Dataset name: Dataset
INFO - 11:58:19:    Dataset size: 2
INFO - 11:58:19:    Surrogate model: LinearRegressor
INFO - 11:58:19: Use the surrogate discipline: LinReg_Dataset
INFO - 11:58:19:    Inputs: x
INFO - 11:58:19:    Outputs: y
INFO - 11:58:19:    Jacobian: use surrogate model jacobian

See also

Note that a subset of the inputs and outputs to be used to build the SurrogateDiscipline may be specified by the user if needed, mainly to avoid unnecessary computations.

Then, we execute it as any MDODiscipline:

input_data = {"x": array([2.0])}
out = synthetic_surrogate.execute(input_data)
print(out["y"])
[2.]

In our study case, from the DOE built at Step 1, we build a RBFRegressor of \(y_4\) representing the range in function of L/D:

range_surrogate = create_surrogate("RBFRegressor", mission_dataset)
INFO - 11:58:19: Build the surrogate discipline: RBF_DOEScenario
INFO - 11:58:19:    Dataset name: DOEScenario
INFO - 11:58:19:    Dataset size: 30
INFO - 11:58:19:    Surrogate model: RBFRegressor
INFO - 11:58:19: Use the surrogate discipline: RBF_DOEScenario
INFO - 11:58:19:    Inputs: x_shared, y_24, y_34
INFO - 11:58:19:    Outputs: y_4
INFO - 11:58:19:    Jacobian: use surrogate model jacobian

Use the SurrogateDiscipline in MDO

The obtained SurrogateDiscipline can be used in any Scenario, such as a DOEScenario or MDOScenario. We see here that the MDODiscipline.execute() method can be used as in any other discipline to compute the outputs for given inputs:

for i in range(5):
    lod = i * 2.0
    y_4_pred = range_surrogate.execute({"y_24": array([lod])})["y_4"]
    print(f"Surrogate range (L/D = {lod}) = {y_4_pred}")
Surrogate range (L/D = 0.0) = [-97.86844673]
Surrogate range (L/D = 2.0) = [184.60105962]
Surrogate range (L/D = 4.0) = [505.37518268]
Surrogate range (L/D = 6.0) = [840.33241658]
Surrogate range (L/D = 8.0) = [1161.49215263]

And we can build and execute an optimization scenario from it. The design variables are “y_24”. The Jacobian matrix is computed by finite differences by default for surrogates, except for the SurrogateDiscipline relying on LinearRegressor which has an analytical (and constant) Jacobian.

design_space = design_space.filter(["y_24"])
scenario = create_scenario(
    range_surrogate,
    formulation="DisciplinaryOpt",
    objective_name="y_4",
    design_space=design_space,
    scenario_type="MDO",
    maximize_objective=True,
)
scenario.execute({"max_iter": 30, "algo": "L-BFGS-B"})
    INFO - 11:58:19:
    INFO - 11:58:19: *** Start MDOScenario execution ***
    INFO - 11:58:19: MDOScenario
    INFO - 11:58:19:    Disciplines: Surrogate discipline: RBF_DOEScenario
    INFO - 11:58:19:    Dataset name: DOEScenario
    INFO - 11:58:19:    Dataset size: 30
    INFO - 11:58:19:    Surrogate model: RBFRegressor
    INFO - 11:58:19:    Inputs: x_shared, y_24, y_34
    INFO - 11:58:19:    Outputs: y_4
    INFO - 11:58:19:    MDO formulation: DisciplinaryOpt
    INFO - 11:58:19: Optimization problem:
    INFO - 11:58:19:    minimize -y_4(y_24)
    INFO - 11:58:19:    with respect to y_24
    INFO - 11:58:19:    over the design space:
    INFO - 11:58:19:    +------+-------------+--------------------+-------------+-------+
    INFO - 11:58:19:    | name | lower_bound |       value        | upper_bound | type  |
    INFO - 11:58:19:    +------+-------------+--------------------+-------------+-------+
    INFO - 11:58:19:    | y_24 |     0.44    | 0.8060924457095278 |    11.13    | float |
    INFO - 11:58:19:    +------+-------------+--------------------+-------------+-------+
    INFO - 11:58:19: Solving optimization problem with algorithm L-BFGS-B:
    INFO - 11:58:19: ...   0%|          | 0/30 [00:00<?, ?it]
    INFO - 11:58:20: ...   7%|▋         | 2/30 [00:00<00:00, 7327.15 it/sec, obj=-1.59e+3]
    INFO - 11:58:20: Optimization result:
    INFO - 11:58:20:    Optimizer info:
    INFO - 11:58:20:       Status: 0
    INFO - 11:58:20:       Message: CONVERGENCE: NORM_OF_PROJECTED_GRADIENT_<=_PGTOL
    INFO - 11:58:20:       Number of calls to the objective function by the optimizer: 3
    INFO - 11:58:20:    Solution:
    INFO - 11:58:20:       Objective: -1589.7138353791051
    INFO - 11:58:20:       Design space:
    INFO - 11:58:20:       +------+-------------+-------+-------------+-------+
    INFO - 11:58:20:       | name | lower_bound | value | upper_bound | type  |
    INFO - 11:58:20:       +------+-------------+-------+-------------+-------+
    INFO - 11:58:20:       | y_24 |     0.44    | 11.13 |    11.13    | float |
    INFO - 11:58:20:       +------+-------------+-------+-------------+-------+
    INFO - 11:58:20: *** End MDOScenario execution (time: 0:00:00.013227) ***

{'max_iter': 30, 'algo': 'L-BFGS-B'}

Available surrogate models

Currently, the following surrogate models are available:

To understand the detailed behavior of the models, please go to the documentation of the used packages.

Extending surrogate models ————————–

All surrogate models work the same way: the MLRegressionAlgo base class shall be extended. See Extend GEMSEO features to learn how to run GEMSEO with external Python modules. Then, the RegressionModelFactory can build the new MLRegressionAlgo automatically from its regression algorithm name and options. This factory is called by the constructor of SurrogateDiscipline.

See also

More generally, GEMSEO provides extension mechanisms to integrate external :DOE and optimization algorithms, disciplines, MDAs and surrogate models.

Total running time of the script: ( 0 minutes 0.078 seconds)

Gallery generated by Sphinx-Gallery