Note
Click here to download the full example code
Dataset from an optimization problem¶
In this example, we will see how to build a Dataset from objects
of an OptimizationProblem.
For that, we need to import this Dataset class:
from __future__ import absolute_import, division, print_function, unicode_literals
from future import standard_library
from gemseo.api import configure_logger, create_discipline, create_scenario
from gemseo.problems.sellar.sellar_design_space import SellarDesignSpace
configure_logger()
standard_library.install_aliases()
Synthetic data¶
We can sample the Sellar1 discipline and use the
corresponding OptimizationProblem:
discipline = create_discipline("Sellar1")
design_space = SellarDesignSpace().filter(discipline.get_input_data_names())
scenario = create_scenario(
[discipline], "DisciplinaryOpt", "y_0", design_space, scenario_type="DOE"
)
scenario.execute({"algo": "lhs", "n_samples": 5})
opt_problem = scenario.formulation.opt_problem
Create a dataset¶
We can easily build a dataset from this OptimizationProblem:
either by separating the design parameters from the function
(default option):
dataset = opt_problem.export_to_dataset("sellar1_doe")
print(dataset)
Out:
sellar1_doe
| Number of samples: 5
| Number of variables: 4
| Variables names and sizes by group:
| - design_parameters: x_local (1), x_shared (2), y_1 (1)
| - functions: y_0 (1)
| Number of dimensions (total = 5) by group:
| - design_parameters: 4
| - functions: 1
or by considering all features as default parameters:
dataset = opt_problem.export_to_dataset("sellar1_doe", categorize=False)
print(dataset)
Out:
sellar1_doe
| Number of samples: 5
| Number of variables: 4
| Variables names and sizes by group:
| - parameters: x_local (1), x_shared (2), y_1 (1), y_0 (1)
| Number of dimensions (total = 5) by group:
| - parameters: 5
or by using an input-output naming rather than an optimization naming:
dataset = opt_problem.export_to_dataset("sellar1_doe", opt_naming=False)
print(dataset)
Out:
sellar1_doe
| Number of samples: 5
| Number of variables: 4
| Variables names and sizes by group:
| - inputs: x_local (1), x_shared (2), y_1 (1)
| - outputs: y_0 (1)
| Number of dimensions (total = 5) by group:
| - inputs: 4
| - outputs: 1
Access properties¶
dataset = opt_problem.export_to_dataset("sellar1_doe")
Variables names¶
We can access the variables names:
print(dataset.variables)
Out:
['x_local', 'x_shared', 'y_0', 'y_1']
Variables sizes¶
We can access the variables sizes:
print(dataset.sizes)
Out:
{'x_local': 1, 'x_shared': 2, 'y_1': 1, 'y_0': 1}
Variables groups¶
We can access the variables groups:
print(dataset.groups)
Out:
['design_parameters', 'functions']
Access data¶
Access by group¶
We can get the data by group, either as an array (default option):
print(dataset.get_data_by_group("design_parameters"))
Out:
[[ 1.81452514 -3.72798685 7.53919754 38.41579806]
[ 3.92826786 -0.48697712 2.88367228 82.15592618]
[ 8.83368465 5.64262221 5.85431838 -88.87436758]
[ 5.27550899 6.37407839 0.86650059 -33.1047004 ]
[ 6.74914349 -8.20498248 9.36072454 -17.76739276]]
or as a dictionary indexed by the variables names:
print(dataset.get_data_by_group("design_parameters", True))
Out:
{'x_local': array([[1.81452514],
[3.92826786],
[8.83368465],
[5.27550899],
[6.74914349]]), 'x_shared': array([[-3.72798685, 7.53919754],
[-0.48697712, 2.88367228],
[ 5.64262221, 5.85431838],
[ 6.37407839, 0.86650059],
[-8.20498248, 9.36072454]]), 'y_1': array([[ 38.41579806],
[ 82.15592618],
[-88.87436758],
[-33.1047004 ],
[-17.76739276]])}
Access by variable name¶
We can get the data by variables names, either as a dictionary indexed by the variables names (default option):
print(dataset.get_data_by_names(["x_shared", "y_1"]))
Out:
{'x_shared': array([[-3.72798685, 7.53919754],
[-0.48697712, 2.88367228],
[ 5.64262221, 5.85431838],
[ 6.37407839, 0.86650059],
[-8.20498248, 9.36072454]]), 'y_1': array([[ 38.41579806],
[ 82.15592618],
[-88.87436758],
[-33.1047004 ],
[-17.76739276]])}
or as an array:
print(dataset.get_data_by_names(["x_shared", "y_1"], False))
Out:
[[ -3.72798685 7.53919754 38.41579806]
[ -0.48697712 2.88367228 82.15592618]
[ 5.64262221 5.85431838 -88.87436758]
[ 6.37407839 0.86650059 -33.1047004 ]
[ -8.20498248 9.36072454 -17.76739276]]
Access all data¶
We can get all the data, either as a large array:
print(dataset.get_all_data())
Out:
({'design_parameters': array([[ 1.81452514, -3.72798685, 7.53919754, 38.41579806],
[ 3.92826786, -0.48697712, 2.88367228, 82.15592618],
[ 8.83368465, 5.64262221, 5.85431838, -88.87436758],
[ 5.27550899, 6.37407839, 0.86650059, -33.1047004 ],
[ 6.74914349, -8.20498248, 9.36072454, -17.76739276]]), 'functions': array([[3.9456874 ],
[0. ],
[8.01885665],
[7.30697098],
[9.32657944]])}, {'design_parameters': ['x_local', 'x_shared', 'y_1'], 'functions': ['y_0']}, {'x_local': 1, 'x_shared': 2, 'y_1': 1, 'y_0': 1})
or as a dictionary indexed by variables names:
print(dataset.get_all_data(as_dict=True))
Out:
{'design_parameters': {'x_local': array([[1.81452514],
[3.92826786],
[8.83368465],
[5.27550899],
[6.74914349]]), 'x_shared': array([[-3.72798685, 7.53919754],
[-0.48697712, 2.88367228],
[ 5.64262221, 5.85431838],
[ 6.37407839, 0.86650059],
[-8.20498248, 9.36072454]]), 'y_1': array([[ 38.41579806],
[ 82.15592618],
[-88.87436758],
[-33.1047004 ],
[-17.76739276]])}, 'functions': {'y_0': array([[3.9456874 ],
[0. ],
[8.01885665],
[7.30697098],
[9.32657944]])}}
We can get these data sorted by category, either with a large array for each category:
print(dataset.get_all_data(by_group=False))
Out:
(array([[ 1.81452514, -3.72798685, 7.53919754, 38.41579806,
3.9456874 ],
[ 3.92826786, -0.48697712, 2.88367228, 82.15592618,
0. ],
[ 8.83368465, 5.64262221, 5.85431838, -88.87436758,
8.01885665],
[ 5.27550899, 6.37407839, 0.86650059, -33.1047004 ,
7.30697098],
[ 6.74914349, -8.20498248, 9.36072454, -17.76739276,
9.32657944]]), ['x_local', 'x_shared', 'y_1', 'y_0'], {'x_local': 1, 'x_shared': 2, 'y_1': 1, 'y_0': 1})
or with a dictionary of variables names:
print(dataset.get_all_data(by_group=False, as_dict=True))
Out:
{'x_local': array([[1.81452514],
[3.92826786],
[8.83368465],
[5.27550899],
[6.74914349]]), 'x_shared': array([[-3.72798685, 7.53919754],
[-0.48697712, 2.88367228],
[ 5.64262221, 5.85431838],
[ 6.37407839, 0.86650059],
[-8.20498248, 9.36072454]]), 'y_1': array([[ 38.41579806],
[ 82.15592618],
[-88.87436758],
[-33.1047004 ],
[-17.76739276]]), 'y_0': array([[3.9456874 ],
[0. ],
[8.01885665],
[7.30697098],
[9.32657944]])}
Total running time of the script: ( 0 minutes 0.047 seconds)