Note
Click here to download the full example code
Dataset from an optimization problem¶
In this example, we will see how to build a Dataset
from objects
of an OptimizationProblem
.
For that, we need to import this Dataset
class:
from __future__ import division, unicode_literals
from gemseo.api import configure_logger, create_discipline, create_scenario
from gemseo.problems.sellar.sellar_design_space import SellarDesignSpace
configure_logger()
Out:
<RootLogger root (INFO)>
Synthetic data¶
We can sample the Sellar1
discipline and use the
corresponding OptimizationProblem
:
discipline = create_discipline("Sellar1")
design_space = SellarDesignSpace().filter(discipline.get_input_data_names())
scenario = create_scenario(
[discipline], "DisciplinaryOpt", "y_1", design_space, scenario_type="DOE"
)
scenario.execute({"algo": "lhs", "n_samples": 5})
opt_problem = scenario.formulation.opt_problem
Out:
INFO - 09:22:30:
INFO - 09:22:30: *** Start DOE Scenario execution ***
INFO - 09:22:30: DOEScenario
INFO - 09:22:30: Disciplines: Sellar1
INFO - 09:22:30: MDOFormulation: DisciplinaryOpt
INFO - 09:22:30: Algorithm: lhs
INFO - 09:22:30: Optimization problem:
INFO - 09:22:30: Minimize: y_1(x_local, x_shared, y_2)
INFO - 09:22:30: With respect to: x_local, x_shared, y_2
INFO - 09:22:30: DOE sampling: 0%| | 0/5 [00:00<?, ?it]
INFO - 09:22:30: DOE sampling: 100%|██████████| 5/5 [00:00<00:00, 826.20 it/sec, obj=5.2]
INFO - 09:22:30: Optimization result:
INFO - 09:22:30: Objective value = 4.286541717010372
INFO - 09:22:30: The result is feasible.
INFO - 09:22:30: Status: None
INFO - 09:22:30: Optimizer message: None
INFO - 09:22:30: Number of calls to the objective function by the optimizer: 5
INFO - 09:22:30: Design Space:
INFO - 09:22:30: +----------+-------------+--------------------+-------------+-------+
INFO - 09:22:30: | name | lower_bound | value | upper_bound | type |
INFO - 09:22:30: +----------+-------------+--------------------+-------------+-------+
INFO - 09:22:30: | x_local | 0 | 0.8340440094051481 | 10 | float |
INFO - 09:22:30: | x_shared | -10 | 0.1552669360134296 | 10 | float |
INFO - 09:22:30: | x_shared | 0 | 8.280773877190468 | 10 | float |
INFO - 09:22:30: | y_2 | -100 | -46.17757091827809 | 100 | float |
INFO - 09:22:30: +----------+-------------+--------------------+-------------+-------+
INFO - 09:22:30: *** DOE Scenario run terminated ***
Create a dataset¶
We can easily build a dataset from this OptimizationProblem
:
either by separating the design parameters from the function
(default option):
dataset = opt_problem.export_to_dataset("sellar1_doe")
print(dataset)
Out:
sellar1_doe
Number of samples: 5
Number of variables: 4
Variables names and sizes by group:
design_parameters: x_local (1), x_shared (2), y_2 (1)
functions: y_1 (1)
Number of dimensions (total = 5) by group:
design_parameters: 4
functions: 1
or by considering all features as default parameters:
dataset = opt_problem.export_to_dataset("sellar1_doe", categorize=False)
print(dataset)
Out:
sellar1_doe
Number of samples: 5
Number of variables: 4
Variables names and sizes by group:
parameters: x_local (1), x_shared (2), y_2 (1), y_1 (1)
Number of dimensions (total = 5) by group:
parameters: 5
or by using an input-output naming rather than an optimization naming:
dataset = opt_problem.export_to_dataset("sellar1_doe", opt_naming=False)
print(dataset)
Out:
sellar1_doe
Number of samples: 5
Number of variables: 4
Variables names and sizes by group:
inputs: x_local (1), x_shared (2), y_2 (1)
outputs: y_1 (1)
Number of dimensions (total = 5) by group:
inputs: 4
outputs: 1
Access properties¶
dataset = opt_problem.export_to_dataset("sellar1_doe")
Variables names¶
We can access the variables names:
print(dataset.variables)
Out:
['x_local', 'x_shared', 'y_1', 'y_2']
Variables sizes¶
We can access the variables sizes:
print(dataset.sizes)
Out:
{'x_local': 1, 'x_shared': 2, 'y_2': 1, 'y_1': 1}
Variables groups¶
We can access the variables groups:
print(dataset.groups)
Out:
['design_parameters', 'functions']
Access data¶
Access by group¶
We can get the data by group, either as an array (default option):
print(dataset.get_data_by_group("design_parameters"))
Out:
[[ 4.79353495e+00 5.51246975e+00 4.83838903e+00 -8.79066971e+01]
[ 8.34044009e-01 1.55266936e-01 8.28077388e+00 -4.61775709e+01]
[ 6.40890450e+00 -7.11870203e+00 6.05477519e+00 7.40878002e+00]
[ 8.83460960e+00 8.23475931e+00 2.28749635e-04 6.79240596e+01]
[ 2.29351178e+00 -5.63064562e+00 2.37252042e+00 4.68187004e+01]]
or as a dictionary indexed by the variables names:
print(dataset.get_data_by_group("design_parameters", True))
Out:
{'x_local': array([[4.79353495],
[0.83404401],
[6.4089045 ],
[8.8346096 ],
[2.29351178]]), 'x_shared': array([[ 5.51246975e+00, 4.83838903e+00],
[ 1.55266936e-01, 8.28077388e+00],
[-7.11870203e+00, 6.05477519e+00],
[ 8.23475931e+00, 2.28749635e-04],
[-5.63064562e+00, 2.37252042e+00]]), 'y_2': array([[-87.90669709],
[-46.17757092],
[ 7.40878002],
[ 67.92405956],
[ 46.81870041]])}
Access by variable name¶
We can get the data by variables names, either as a dictionary indexed by the variables names (default option):
print(dataset.get_data_by_names(["x_shared", "y_2"]))
Out:
{'x_shared': array([[ 5.51246975e+00, 4.83838903e+00],
[ 1.55266936e-01, 8.28077388e+00],
[-7.11870203e+00, 6.05477519e+00],
[ 8.23475931e+00, 2.28749635e-04],
[-5.63064562e+00, 2.37252042e+00]]), 'y_2': array([[-87.90669709],
[-46.17757092],
[ 7.40878002],
[ 67.92405956],
[ 46.81870041]])}
or as an array:
print(dataset.get_data_by_names(["x_shared", "y_2"], False))
Out:
[[ 5.51246975e+00 4.83838903e+00 -8.79066971e+01]
[ 1.55266936e-01 8.28077388e+00 -4.61775709e+01]
[-7.11870203e+00 6.05477519e+00 7.40878002e+00]
[ 8.23475931e+00 2.28749635e-04 6.79240596e+01]
[-5.63064562e+00 2.37252042e+00 4.68187004e+01]]
Access all data¶
We can get all the data, either as a large array:
print(dataset.get_all_data())
Out:
({'design_parameters': array([[ 4.79353495e+00, 5.51246975e+00, 4.83838903e+00,
-8.79066971e+01],
[ 8.34044009e-01, 1.55266936e-01, 8.28077388e+00,
-4.61775709e+01],
[ 6.40890450e+00, -7.11870203e+00, 6.05477519e+00,
7.40878002e+00],
[ 8.83460960e+00, 8.23475931e+00, 2.28749635e-04,
6.79240596e+01],
[ 2.29351178e+00, -5.63064562e+00, 2.37252042e+00,
4.68187004e+01]]), 'functions': array([[7.589505 ],
[4.28654172],
[7.85225077],
[7.94111374],
[5.19677421]])}, {'design_parameters': ['x_local', 'x_shared', 'y_2'], 'functions': ['y_1']}, {'x_local': 1, 'x_shared': 2, 'y_2': 1, 'y_1': 1})
or as a dictionary indexed by variables names:
print(dataset.get_all_data(as_dict=True))
Out:
{'design_parameters': {'x_local': array([[4.79353495],
[0.83404401],
[6.4089045 ],
[8.8346096 ],
[2.29351178]]), 'x_shared': array([[ 5.51246975e+00, 4.83838903e+00],
[ 1.55266936e-01, 8.28077388e+00],
[-7.11870203e+00, 6.05477519e+00],
[ 8.23475931e+00, 2.28749635e-04],
[-5.63064562e+00, 2.37252042e+00]]), 'y_2': array([[-87.90669709],
[-46.17757092],
[ 7.40878002],
[ 67.92405956],
[ 46.81870041]])}, 'functions': {'y_1': array([[7.589505 ],
[4.28654172],
[7.85225077],
[7.94111374],
[5.19677421]])}}
We can get these data sorted by category, either with a large array for each category:
print(dataset.get_all_data(by_group=False))
Out:
(array([[ 4.79353495e+00, 5.51246975e+00, 4.83838903e+00,
-8.79066971e+01, 7.58950500e+00],
[ 8.34044009e-01, 1.55266936e-01, 8.28077388e+00,
-4.61775709e+01, 4.28654172e+00],
[ 6.40890450e+00, -7.11870203e+00, 6.05477519e+00,
7.40878002e+00, 7.85225077e+00],
[ 8.83460960e+00, 8.23475931e+00, 2.28749635e-04,
6.79240596e+01, 7.94111374e+00],
[ 2.29351178e+00, -5.63064562e+00, 2.37252042e+00,
4.68187004e+01, 5.19677421e+00]]), ['x_local', 'x_shared', 'y_2', 'y_1'], {'x_local': 1, 'x_shared': 2, 'y_2': 1, 'y_1': 1})
or with a dictionary of variables names:
print(dataset.get_all_data(by_group=False, as_dict=True))
Out:
{'x_local': array([[4.79353495],
[0.83404401],
[6.4089045 ],
[8.8346096 ],
[2.29351178]]), 'x_shared': array([[ 5.51246975e+00, 4.83838903e+00],
[ 1.55266936e-01, 8.28077388e+00],
[-7.11870203e+00, 6.05477519e+00],
[ 8.23475931e+00, 2.28749635e-04],
[-5.63064562e+00, 2.37252042e+00]]), 'y_2': array([[-87.90669709],
[-46.17757092],
[ 7.40878002],
[ 67.92405956],
[ 46.81870041]]), 'y_1': array([[7.589505 ],
[4.28654172],
[7.85225077],
[7.94111374],
[5.19677421]])}
Total running time of the script: ( 0 minutes 0.080 seconds)