Dataset from an optimization problem

In this example, we will see how to build a Dataset from objects of an OptimizationProblem. For that, we need to import this Dataset class:

from __future__ import absolute_import, division, print_function, unicode_literals

from future import standard_library

from gemseo.api import configure_logger, create_discipline, create_scenario
from gemseo.problems.sellar.sellar_design_space import SellarDesignSpace

configure_logger()

standard_library.install_aliases()

Synthetic data

We can sample the Sellar1 discipline and use the corresponding OptimizationProblem:

discipline = create_discipline("Sellar1")
design_space = SellarDesignSpace().filter(discipline.get_input_data_names())

scenario = create_scenario(
    [discipline], "DisciplinaryOpt", "y_0", design_space, scenario_type="DOE"
)
scenario.execute({"algo": "lhs", "n_samples": 5})
opt_problem = scenario.formulation.opt_problem

Create a dataset

We can easily build a dataset from this OptimizationProblem: either by separating the design parameters from the function (default option):

dataset = opt_problem.export_to_dataset("sellar1_doe")
print(dataset)

Out:

sellar1_doe
| Number of samples: 5
| Number of variables: 4
| Variables names and sizes by group:
| - design_parameters: x_local (1), x_shared (2), y_1 (1)
| - functions: y_0 (1)
| Number of dimensions (total = 5) by group:
| - design_parameters: 4
| - functions: 1

or by considering all features as default parameters:

dataset = opt_problem.export_to_dataset("sellar1_doe", categorize=False)
print(dataset)

Out:

sellar1_doe
| Number of samples: 5
| Number of variables: 4
| Variables names and sizes by group:
| - parameters: x_local (1), x_shared (2), y_1 (1), y_0 (1)
| Number of dimensions (total = 5) by group:
| - parameters: 5

or by using an input-output naming rather than an optimization naming:

dataset = opt_problem.export_to_dataset("sellar1_doe", opt_naming=False)
print(dataset)

Out:

sellar1_doe
| Number of samples: 5
| Number of variables: 4
| Variables names and sizes by group:
| - inputs: x_local (1), x_shared (2), y_1 (1)
| - outputs: y_0 (1)
| Number of dimensions (total = 5) by group:
| - inputs: 4
| - outputs: 1

Access properties

dataset = opt_problem.export_to_dataset("sellar1_doe")

Variables names

We can access the variables names:

print(dataset.variables)

Out:

['x_local', 'x_shared', 'y_0', 'y_1']

Variables sizes

We can access the variables sizes:

print(dataset.sizes)

Out:

{'x_local': 1, 'x_shared': 2, 'y_1': 1, 'y_0': 1}

Variables groups

We can access the variables groups:

print(dataset.groups)

Out:

['design_parameters', 'functions']

Access data

Access by group

We can get the data by group, either as an array (default option):

print(dataset.get_data_by_group("design_parameters"))

Out:

[[  1.81452514  -3.72798685   7.53919754  38.41579806]
 [  3.92826786  -0.48697712   2.88367228  82.15592618]
 [  8.83368465   5.64262221   5.85431838 -88.87436758]
 [  5.27550899   6.37407839   0.86650059 -33.1047004 ]
 [  6.74914349  -8.20498248   9.36072454 -17.76739276]]

or as a dictionary indexed by the variables names:

print(dataset.get_data_by_group("design_parameters", True))

Out:

{'x_local': array([[1.81452514],
       [3.92826786],
       [8.83368465],
       [5.27550899],
       [6.74914349]]), 'x_shared': array([[-3.72798685,  7.53919754],
       [-0.48697712,  2.88367228],
       [ 5.64262221,  5.85431838],
       [ 6.37407839,  0.86650059],
       [-8.20498248,  9.36072454]]), 'y_1': array([[ 38.41579806],
       [ 82.15592618],
       [-88.87436758],
       [-33.1047004 ],
       [-17.76739276]])}

Access by variable name

We can get the data by variables names, either as a dictionary indexed by the variables names (default option):

print(dataset.get_data_by_names(["x_shared", "y_1"]))

Out:

{'x_shared': array([[-3.72798685,  7.53919754],
       [-0.48697712,  2.88367228],
       [ 5.64262221,  5.85431838],
       [ 6.37407839,  0.86650059],
       [-8.20498248,  9.36072454]]), 'y_1': array([[ 38.41579806],
       [ 82.15592618],
       [-88.87436758],
       [-33.1047004 ],
       [-17.76739276]])}

or as an array:

print(dataset.get_data_by_names(["x_shared", "y_1"], False))

Out:

[[ -3.72798685   7.53919754  38.41579806]
 [ -0.48697712   2.88367228  82.15592618]
 [  5.64262221   5.85431838 -88.87436758]
 [  6.37407839   0.86650059 -33.1047004 ]
 [ -8.20498248   9.36072454 -17.76739276]]

Access all data

We can get all the data, either as a large array:

print(dataset.get_all_data())

Out:

({'design_parameters': array([[  1.81452514,  -3.72798685,   7.53919754,  38.41579806],
       [  3.92826786,  -0.48697712,   2.88367228,  82.15592618],
       [  8.83368465,   5.64262221,   5.85431838, -88.87436758],
       [  5.27550899,   6.37407839,   0.86650059, -33.1047004 ],
       [  6.74914349,  -8.20498248,   9.36072454, -17.76739276]]), 'functions': array([[3.9456874 ],
       [0.        ],
       [8.01885665],
       [7.30697098],
       [9.32657944]])}, {'design_parameters': ['x_local', 'x_shared', 'y_1'], 'functions': ['y_0']}, {'x_local': 1, 'x_shared': 2, 'y_1': 1, 'y_0': 1})

or as a dictionary indexed by variables names:

print(dataset.get_all_data(as_dict=True))

Out:

{'design_parameters': {'x_local': array([[1.81452514],
       [3.92826786],
       [8.83368465],
       [5.27550899],
       [6.74914349]]), 'x_shared': array([[-3.72798685,  7.53919754],
       [-0.48697712,  2.88367228],
       [ 5.64262221,  5.85431838],
       [ 6.37407839,  0.86650059],
       [-8.20498248,  9.36072454]]), 'y_1': array([[ 38.41579806],
       [ 82.15592618],
       [-88.87436758],
       [-33.1047004 ],
       [-17.76739276]])}, 'functions': {'y_0': array([[3.9456874 ],
       [0.        ],
       [8.01885665],
       [7.30697098],
       [9.32657944]])}}

We can get these data sorted by category, either with a large array for each category:

print(dataset.get_all_data(by_group=False))

Out:

(array([[  1.81452514,  -3.72798685,   7.53919754,  38.41579806,
          3.9456874 ],
       [  3.92826786,  -0.48697712,   2.88367228,  82.15592618,
          0.        ],
       [  8.83368465,   5.64262221,   5.85431838, -88.87436758,
          8.01885665],
       [  5.27550899,   6.37407839,   0.86650059, -33.1047004 ,
          7.30697098],
       [  6.74914349,  -8.20498248,   9.36072454, -17.76739276,
          9.32657944]]), ['x_local', 'x_shared', 'y_1', 'y_0'], {'x_local': 1, 'x_shared': 2, 'y_1': 1, 'y_0': 1})

or with a dictionary of variables names:

print(dataset.get_all_data(by_group=False, as_dict=True))

Out:

{'x_local': array([[1.81452514],
       [3.92826786],
       [8.83368465],
       [5.27550899],
       [6.74914349]]), 'x_shared': array([[-3.72798685,  7.53919754],
       [-0.48697712,  2.88367228],
       [ 5.64262221,  5.85431838],
       [ 6.37407839,  0.86650059],
       [-8.20498248,  9.36072454]]), 'y_1': array([[ 38.41579806],
       [ 82.15592618],
       [-88.87436758],
       [-33.1047004 ],
       [-17.76739276]]), 'y_0': array([[3.9456874 ],
       [0.        ],
       [8.01885665],
       [7.30697098],
       [9.32657944]])}

Total running time of the script: ( 0 minutes 0.047 seconds)

Gallery generated by Sphinx-Gallery