Note
Click here to download the full example code
Dataset from an optimization problem¶
In this example, we will see how to build a Dataset
from objects
of an OptimizationProblem
.
For that, we need to import this Dataset
class:
from __future__ import annotations
from gemseo.api import configure_logger
from gemseo.api import create_discipline
from gemseo.api import create_scenario
from gemseo.problems.sellar.sellar_design_space import SellarDesignSpace
configure_logger()
<RootLogger root (INFO)>
Synthetic data¶
We can sample the Sellar1
discipline and use the
corresponding OptimizationProblem
:
discipline = create_discipline("Sellar1")
design_space = SellarDesignSpace().filter(discipline.get_input_data_names())
scenario = create_scenario(
[discipline], "DisciplinaryOpt", "y_1", design_space, scenario_type="DOE"
)
scenario.execute({"algo": "lhs", "n_samples": 5})
opt_problem = scenario.formulation.opt_problem
/home/docs/checkouts/readthedocs.org/user_builds/gemseo/envs/4.2.0/lib/python3.9/site-packages/gemseo/algos/design_space.py:459: ComplexWarning: Casting complex values to real discards the imaginary part
self.__current_value[name] = array_value.astype(
INFO - 17:24:33:
INFO - 17:24:33: *** Start DOEScenario execution ***
INFO - 17:24:33: DOEScenario
INFO - 17:24:33: Disciplines: Sellar1
INFO - 17:24:33: MDO formulation: DisciplinaryOpt
INFO - 17:24:33: Optimization problem:
INFO - 17:24:33: minimize y_1(x_local, x_shared, y_2)
INFO - 17:24:33: with respect to x_local, x_shared, y_2
INFO - 17:24:33: over the design space:
INFO - 17:24:33: +-------------+-------------+-------+-------------+-------+
INFO - 17:24:33: | name | lower_bound | value | upper_bound | type |
INFO - 17:24:33: +-------------+-------------+-------+-------------+-------+
INFO - 17:24:33: | x_local | 0 | 1 | 10 | float |
INFO - 17:24:33: | x_shared[0] | -10 | 4 | 10 | float |
INFO - 17:24:33: | x_shared[1] | 0 | 3 | 10 | float |
INFO - 17:24:33: | y_2 | -100 | 1 | 100 | float |
INFO - 17:24:33: +-------------+-------------+-------+-------------+-------+
INFO - 17:24:33: Solving optimization problem with algorithm lhs:
INFO - 17:24:33: ... 0%| | 0/5 [00:00<?, ?it]
INFO - 17:24:33: ... 20%|██ | 1/5 [00:00<00:00, 200.56 it/sec, obj=7.59+j]
INFO - 17:24:33: ... 40%|████ | 2/5 [00:00<00:00, 329.99 it/sec, obj=4.29+j]
INFO - 17:24:33: ... 60%|██████ | 3/5 [00:00<00:00, 424.48 it/sec, obj=7.85+j]
INFO - 17:24:33: ... 80%|████████ | 4/5 [00:00<00:00, 496.46 it/sec, obj=7.94+j]
INFO - 17:24:33: ... 100%|██████████| 5/5 [00:00<00:00, 546.52 it/sec, obj=5.2+j]
INFO - 17:24:33: Optimization result:
INFO - 17:24:33: Optimizer info:
INFO - 17:24:33: Status: None
INFO - 17:24:33: Message: None
INFO - 17:24:33: Number of calls to the objective function by the optimizer: 5
INFO - 17:24:33: Solution:
INFO - 17:24:33: Objective: (4.286541717010372+0j)
INFO - 17:24:33: Design space:
INFO - 17:24:33: +-------------+-------------+--------------------+-------------+-------+
INFO - 17:24:33: | name | lower_bound | value | upper_bound | type |
INFO - 17:24:33: +-------------+-------------+--------------------+-------------+-------+
INFO - 17:24:33: | x_local | 0 | 0.8340440094051481 | 10 | float |
INFO - 17:24:33: | x_shared[0] | -10 | 0.1552669360134296 | 10 | float |
INFO - 17:24:33: | x_shared[1] | 0 | 8.280773877190468 | 10 | float |
INFO - 17:24:33: | y_2 | -100 | -46.17757091827809 | 100 | float |
INFO - 17:24:33: +-------------+-------------+--------------------+-------------+-------+
INFO - 17:24:33: *** End DOEScenario execution (time: 0:00:00.027070) ***
Create a dataset¶
We can easily build a dataset from this OptimizationProblem
:
either by separating the design parameters from the function
(default option):
dataset = opt_problem.export_to_dataset("sellar1_doe")
print(dataset)
sellar1_doe
Number of samples: 5
Number of variables: 4
Variables names and sizes by group:
design_parameters: x_local (1), x_shared (2), y_2 (1)
functions: y_1 (1)
Number of dimensions (total = 5) by group:
design_parameters: 4
functions: 1
or by considering all features as default parameters:
dataset = opt_problem.export_to_dataset("sellar1_doe", categorize=False)
print(dataset)
sellar1_doe
Number of samples: 5
Number of variables: 4
Variables names and sizes by group:
parameters: x_local (1), x_shared (2), y_1 (1), y_2 (1)
Number of dimensions (total = 5) by group:
parameters: 5
or by using an input-output naming rather than an optimization naming:
dataset = opt_problem.export_to_dataset("sellar1_doe", opt_naming=False)
print(dataset)
sellar1_doe
Number of samples: 5
Number of variables: 4
Variables names and sizes by group:
inputs: x_local (1), x_shared (2), y_2 (1)
outputs: y_1 (1)
Number of dimensions (total = 5) by group:
inputs: 4
outputs: 1
Note
Only design variables and functions (objective function, constraints) are
stored in the database. If you want to store state variables, you must add
them as observables before the problem is executed. Use the
add_observable()
method.
Access properties¶
dataset = opt_problem.export_to_dataset("sellar1_doe")
Variables names¶
We can access the variables names:
print(dataset.variables)
['x_local', 'x_shared', 'y_1', 'y_2']
Variables sizes¶
We can access the variables sizes:
print(dataset.sizes)
{'x_local': 1, 'x_shared': 2, 'y_2': 1, 'y_1': 1}
Variables groups¶
We can access the variables groups:
print(dataset.groups)
['design_parameters', 'functions']
Access data¶
Access by group¶
We can get the data by group, either as an array (default option):
print(dataset.get_data_by_group("design_parameters"))
[[ 4.79353495e+00 5.51246975e+00 4.83838903e+00 -8.79066971e+01]
[ 8.34044009e-01 1.55266936e-01 8.28077388e+00 -4.61775709e+01]
[ 6.40890450e+00 -7.11870203e+00 6.05477519e+00 7.40878002e+00]
[ 8.83460960e+00 8.23475931e+00 2.28749635e-04 6.79240596e+01]
[ 2.29351178e+00 -5.63064562e+00 2.37252042e+00 4.68187004e+01]]
or as a dictionary indexed by the variables names:
print(dataset.get_data_by_group("design_parameters", True))
{'x_local': array([[4.79353495],
[0.83404401],
[6.4089045 ],
[8.8346096 ],
[2.29351178]]), 'x_shared': array([[ 5.51246975e+00, 4.83838903e+00],
[ 1.55266936e-01, 8.28077388e+00],
[-7.11870203e+00, 6.05477519e+00],
[ 8.23475931e+00, 2.28749635e-04],
[-5.63064562e+00, 2.37252042e+00]]), 'y_2': array([[-87.90669709],
[-46.17757092],
[ 7.40878002],
[ 67.92405956],
[ 46.81870041]])}
Access by variable name¶
We can get the data by variables names, either as a dictionary indexed by the variables names (default option):
print(dataset.get_data_by_names(["x_shared", "y_2"]))
{'x_shared': array([[ 5.51246975e+00, 4.83838903e+00],
[ 1.55266936e-01, 8.28077388e+00],
[-7.11870203e+00, 6.05477519e+00],
[ 8.23475931e+00, 2.28749635e-04],
[-5.63064562e+00, 2.37252042e+00]]), 'y_2': array([[-87.90669709],
[-46.17757092],
[ 7.40878002],
[ 67.92405956],
[ 46.81870041]])}
or as an array:
print(dataset.get_data_by_names(["x_shared", "y_2"], False))
[[ 5.51246975e+00 4.83838903e+00 -8.79066971e+01]
[ 1.55266936e-01 8.28077388e+00 -4.61775709e+01]
[-7.11870203e+00 6.05477519e+00 7.40878002e+00]
[ 8.23475931e+00 2.28749635e-04 6.79240596e+01]
[-5.63064562e+00 2.37252042e+00 4.68187004e+01]]
Access all data¶
We can get all the data, either as a large array:
print(dataset.get_all_data())
({'design_parameters': array([[ 4.79353495e+00, 5.51246975e+00, 4.83838903e+00,
-8.79066971e+01],
[ 8.34044009e-01, 1.55266936e-01, 8.28077388e+00,
-4.61775709e+01],
[ 6.40890450e+00, -7.11870203e+00, 6.05477519e+00,
7.40878002e+00],
[ 8.83460960e+00, 8.23475931e+00, 2.28749635e-04,
6.79240596e+01],
[ 2.29351178e+00, -5.63064562e+00, 2.37252042e+00,
4.68187004e+01]]), 'functions': array([[7.589505 ],
[4.28654172],
[7.85225077],
[7.94111374],
[5.19677421]])}, {'design_parameters': ['x_local', 'x_shared', 'y_2'], 'functions': ['y_1']}, {'x_local': 1, 'x_shared': 2, 'y_2': 1, 'y_1': 1})
or as a dictionary indexed by variables names:
print(dataset.get_all_data(as_dict=True))
{'design_parameters': {'x_local': array([[4.79353495],
[0.83404401],
[6.4089045 ],
[8.8346096 ],
[2.29351178]]), 'x_shared': array([[ 5.51246975e+00, 4.83838903e+00],
[ 1.55266936e-01, 8.28077388e+00],
[-7.11870203e+00, 6.05477519e+00],
[ 8.23475931e+00, 2.28749635e-04],
[-5.63064562e+00, 2.37252042e+00]]), 'y_2': array([[-87.90669709],
[-46.17757092],
[ 7.40878002],
[ 67.92405956],
[ 46.81870041]])}, 'functions': {'y_1': array([[7.589505 ],
[4.28654172],
[7.85225077],
[7.94111374],
[5.19677421]])}}
We can get these data sorted by category, either with a large array for each category:
print(dataset.get_all_data(by_group=False))
(array([[ 4.79353495e+00, 5.51246975e+00, 4.83838903e+00,
-8.79066971e+01, 7.58950500e+00],
[ 8.34044009e-01, 1.55266936e-01, 8.28077388e+00,
-4.61775709e+01, 4.28654172e+00],
[ 6.40890450e+00, -7.11870203e+00, 6.05477519e+00,
7.40878002e+00, 7.85225077e+00],
[ 8.83460960e+00, 8.23475931e+00, 2.28749635e-04,
6.79240596e+01, 7.94111374e+00],
[ 2.29351178e+00, -5.63064562e+00, 2.37252042e+00,
4.68187004e+01, 5.19677421e+00]]), ['x_local', 'x_shared', 'y_2', 'y_1'], {'x_local': 1, 'x_shared': 2, 'y_2': 1, 'y_1': 1})
or with a dictionary of variables names:
print(dataset.get_all_data(by_group=False, as_dict=True))
{'x_local': array([[4.79353495],
[0.83404401],
[6.4089045 ],
[8.8346096 ],
[2.29351178]]), 'x_shared': array([[ 5.51246975e+00, 4.83838903e+00],
[ 1.55266936e-01, 8.28077388e+00],
[-7.11870203e+00, 6.05477519e+00],
[ 8.23475931e+00, 2.28749635e-04],
[-5.63064562e+00, 2.37252042e+00]]), 'y_2': array([[-87.90669709],
[-46.17757092],
[ 7.40878002],
[ 67.92405956],
[ 46.81870041]]), 'y_1': array([[7.589505 ],
[4.28654172],
[7.85225077],
[7.94111374],
[5.19677421]])}
Total running time of the script: ( 0 minutes 0.065 seconds)