Dataset from an optimization problem

In this example, we will see how to build a Dataset from objects of an OptimizationProblem. For that, we need to import this Dataset class:

from __future__ import division, unicode_literals

from gemseo.api import configure_logger, create_discipline, create_scenario
from gemseo.problems.sellar.sellar_design_space import SellarDesignSpace

configure_logger()

Out:

<RootLogger root (INFO)>

Synthetic data

We can sample the Sellar1 discipline and use the corresponding OptimizationProblem:

discipline = create_discipline("Sellar1")
design_space = SellarDesignSpace().filter(discipline.get_input_data_names())

scenario = create_scenario(
    [discipline], "DisciplinaryOpt", "y_1", design_space, scenario_type="DOE"
)
scenario.execute({"algo": "lhs", "n_samples": 5})
opt_problem = scenario.formulation.opt_problem

Out:

INFO - 09:22:30:
INFO - 09:22:30: *** Start DOE Scenario execution ***
INFO - 09:22:30: DOEScenario
INFO - 09:22:30:    Disciplines: Sellar1
INFO - 09:22:30:    MDOFormulation: DisciplinaryOpt
INFO - 09:22:30:    Algorithm: lhs
INFO - 09:22:30: Optimization problem:
INFO - 09:22:30:    Minimize: y_1(x_local, x_shared, y_2)
INFO - 09:22:30:    With respect to: x_local, x_shared, y_2
INFO - 09:22:30: DOE sampling:   0%|          | 0/5 [00:00<?, ?it]
INFO - 09:22:30: DOE sampling: 100%|██████████| 5/5 [00:00<00:00, 826.20 it/sec, obj=5.2]
INFO - 09:22:30: Optimization result:
INFO - 09:22:30: Objective value = 4.286541717010372
INFO - 09:22:30: The result is feasible.
INFO - 09:22:30: Status: None
INFO - 09:22:30: Optimizer message: None
INFO - 09:22:30: Number of calls to the objective function by the optimizer: 5
INFO - 09:22:30: Design Space:
INFO - 09:22:30: +----------+-------------+--------------------+-------------+-------+
INFO - 09:22:30: | name     | lower_bound |       value        | upper_bound | type  |
INFO - 09:22:30: +----------+-------------+--------------------+-------------+-------+
INFO - 09:22:30: | x_local  |      0      | 0.8340440094051481 |      10     | float |
INFO - 09:22:30: | x_shared |     -10     | 0.1552669360134296 |      10     | float |
INFO - 09:22:30: | x_shared |      0      | 8.280773877190468  |      10     | float |
INFO - 09:22:30: | y_2      |     -100    | -46.17757091827809 |     100     | float |
INFO - 09:22:30: +----------+-------------+--------------------+-------------+-------+
INFO - 09:22:30: *** DOE Scenario run terminated ***

Create a dataset

We can easily build a dataset from this OptimizationProblem: either by separating the design parameters from the function (default option):

dataset = opt_problem.export_to_dataset("sellar1_doe")
print(dataset)

Out:

sellar1_doe
   Number of samples: 5
   Number of variables: 4
   Variables names and sizes by group:
      design_parameters: x_local (1), x_shared (2), y_2 (1)
      functions: y_1 (1)
   Number of dimensions (total = 5) by group:
      design_parameters: 4
      functions: 1

or by considering all features as default parameters:

dataset = opt_problem.export_to_dataset("sellar1_doe", categorize=False)
print(dataset)

Out:

sellar1_doe
   Number of samples: 5
   Number of variables: 4
   Variables names and sizes by group:
      parameters: x_local (1), x_shared (2), y_2 (1), y_1 (1)
   Number of dimensions (total = 5) by group:
      parameters: 5

or by using an input-output naming rather than an optimization naming:

dataset = opt_problem.export_to_dataset("sellar1_doe", opt_naming=False)
print(dataset)

Out:

sellar1_doe
   Number of samples: 5
   Number of variables: 4
   Variables names and sizes by group:
      inputs: x_local (1), x_shared (2), y_2 (1)
      outputs: y_1 (1)
   Number of dimensions (total = 5) by group:
      inputs: 4
      outputs: 1

Access properties

dataset = opt_problem.export_to_dataset("sellar1_doe")

Variables names

We can access the variables names:

print(dataset.variables)

Out:

['x_local', 'x_shared', 'y_1', 'y_2']

Variables sizes

We can access the variables sizes:

print(dataset.sizes)

Out:

{'x_local': 1, 'x_shared': 2, 'y_2': 1, 'y_1': 1}

Variables groups

We can access the variables groups:

print(dataset.groups)

Out:

['design_parameters', 'functions']

Access data

Access by group

We can get the data by group, either as an array (default option):

print(dataset.get_data_by_group("design_parameters"))

Out:

[[ 4.79353495e+00  5.51246975e+00  4.83838903e+00 -8.79066971e+01]
 [ 8.34044009e-01  1.55266936e-01  8.28077388e+00 -4.61775709e+01]
 [ 6.40890450e+00 -7.11870203e+00  6.05477519e+00  7.40878002e+00]
 [ 8.83460960e+00  8.23475931e+00  2.28749635e-04  6.79240596e+01]
 [ 2.29351178e+00 -5.63064562e+00  2.37252042e+00  4.68187004e+01]]

or as a dictionary indexed by the variables names:

print(dataset.get_data_by_group("design_parameters", True))

Out:

{'x_local': array([[4.79353495],
       [0.83404401],
       [6.4089045 ],
       [8.8346096 ],
       [2.29351178]]), 'x_shared': array([[ 5.51246975e+00,  4.83838903e+00],
       [ 1.55266936e-01,  8.28077388e+00],
       [-7.11870203e+00,  6.05477519e+00],
       [ 8.23475931e+00,  2.28749635e-04],
       [-5.63064562e+00,  2.37252042e+00]]), 'y_2': array([[-87.90669709],
       [-46.17757092],
       [  7.40878002],
       [ 67.92405956],
       [ 46.81870041]])}

Access by variable name

We can get the data by variables names, either as a dictionary indexed by the variables names (default option):

print(dataset.get_data_by_names(["x_shared", "y_2"]))

Out:

{'x_shared': array([[ 5.51246975e+00,  4.83838903e+00],
       [ 1.55266936e-01,  8.28077388e+00],
       [-7.11870203e+00,  6.05477519e+00],
       [ 8.23475931e+00,  2.28749635e-04],
       [-5.63064562e+00,  2.37252042e+00]]), 'y_2': array([[-87.90669709],
       [-46.17757092],
       [  7.40878002],
       [ 67.92405956],
       [ 46.81870041]])}

or as an array:

print(dataset.get_data_by_names(["x_shared", "y_2"], False))

Out:

[[ 5.51246975e+00  4.83838903e+00 -8.79066971e+01]
 [ 1.55266936e-01  8.28077388e+00 -4.61775709e+01]
 [-7.11870203e+00  6.05477519e+00  7.40878002e+00]
 [ 8.23475931e+00  2.28749635e-04  6.79240596e+01]
 [-5.63064562e+00  2.37252042e+00  4.68187004e+01]]

Access all data

We can get all the data, either as a large array:

print(dataset.get_all_data())

Out:

({'design_parameters': array([[ 4.79353495e+00,  5.51246975e+00,  4.83838903e+00,
        -8.79066971e+01],
       [ 8.34044009e-01,  1.55266936e-01,  8.28077388e+00,
        -4.61775709e+01],
       [ 6.40890450e+00, -7.11870203e+00,  6.05477519e+00,
         7.40878002e+00],
       [ 8.83460960e+00,  8.23475931e+00,  2.28749635e-04,
         6.79240596e+01],
       [ 2.29351178e+00, -5.63064562e+00,  2.37252042e+00,
         4.68187004e+01]]), 'functions': array([[7.589505  ],
       [4.28654172],
       [7.85225077],
       [7.94111374],
       [5.19677421]])}, {'design_parameters': ['x_local', 'x_shared', 'y_2'], 'functions': ['y_1']}, {'x_local': 1, 'x_shared': 2, 'y_2': 1, 'y_1': 1})

or as a dictionary indexed by variables names:

print(dataset.get_all_data(as_dict=True))

Out:

{'design_parameters': {'x_local': array([[4.79353495],
       [0.83404401],
       [6.4089045 ],
       [8.8346096 ],
       [2.29351178]]), 'x_shared': array([[ 5.51246975e+00,  4.83838903e+00],
       [ 1.55266936e-01,  8.28077388e+00],
       [-7.11870203e+00,  6.05477519e+00],
       [ 8.23475931e+00,  2.28749635e-04],
       [-5.63064562e+00,  2.37252042e+00]]), 'y_2': array([[-87.90669709],
       [-46.17757092],
       [  7.40878002],
       [ 67.92405956],
       [ 46.81870041]])}, 'functions': {'y_1': array([[7.589505  ],
       [4.28654172],
       [7.85225077],
       [7.94111374],
       [5.19677421]])}}

We can get these data sorted by category, either with a large array for each category:

print(dataset.get_all_data(by_group=False))

Out:

(array([[ 4.79353495e+00,  5.51246975e+00,  4.83838903e+00,
        -8.79066971e+01,  7.58950500e+00],
       [ 8.34044009e-01,  1.55266936e-01,  8.28077388e+00,
        -4.61775709e+01,  4.28654172e+00],
       [ 6.40890450e+00, -7.11870203e+00,  6.05477519e+00,
         7.40878002e+00,  7.85225077e+00],
       [ 8.83460960e+00,  8.23475931e+00,  2.28749635e-04,
         6.79240596e+01,  7.94111374e+00],
       [ 2.29351178e+00, -5.63064562e+00,  2.37252042e+00,
         4.68187004e+01,  5.19677421e+00]]), ['x_local', 'x_shared', 'y_2', 'y_1'], {'x_local': 1, 'x_shared': 2, 'y_2': 1, 'y_1': 1})

or with a dictionary of variables names:

print(dataset.get_all_data(by_group=False, as_dict=True))

Out:

{'x_local': array([[4.79353495],
       [0.83404401],
       [6.4089045 ],
       [8.8346096 ],
       [2.29351178]]), 'x_shared': array([[ 5.51246975e+00,  4.83838903e+00],
       [ 1.55266936e-01,  8.28077388e+00],
       [-7.11870203e+00,  6.05477519e+00],
       [ 8.23475931e+00,  2.28749635e-04],
       [-5.63064562e+00,  2.37252042e+00]]), 'y_2': array([[-87.90669709],
       [-46.17757092],
       [  7.40878002],
       [ 67.92405956],
       [ 46.81870041]]), 'y_1': array([[7.589505  ],
       [4.28654172],
       [7.85225077],
       [7.94111374],
       [5.19677421]])}

Total running time of the script: ( 0 minutes 0.080 seconds)

Gallery generated by Sphinx-Gallery