Dataset from an optimization problem

In this example, we will see how to build a Dataset from objects of an OptimizationProblem. For that, we need to import this Dataset class:

from __future__ import division, unicode_literals

from gemseo.api import configure_logger, create_discipline, create_scenario
from gemseo.problems.sellar.sellar_design_space import SellarDesignSpace

configure_logger()

Out:

<RootLogger root (INFO)>

Synthetic data

We can sample the Sellar1 discipline and use the corresponding OptimizationProblem:

discipline = create_discipline("Sellar1")
design_space = SellarDesignSpace().filter(discipline.get_input_data_names())

scenario = create_scenario(
    [discipline], "DisciplinaryOpt", "y_1", design_space, scenario_type="DOE"
)
scenario.execute({"algo": "lhs", "n_samples": 5})
opt_problem = scenario.formulation.opt_problem

Out:

INFO - 14:43:44:
INFO - 14:43:44: *** Start DOE Scenario execution ***
INFO - 14:43:44: DOEScenario
INFO - 14:43:44:    Disciplines: Sellar1
INFO - 14:43:44:    MDOFormulation: DisciplinaryOpt
INFO - 14:43:44:    Algorithm: lhs
INFO - 14:43:44: Optimization problem:
INFO - 14:43:44:    Minimize: y_1(x_local, x_shared, y_2)
INFO - 14:43:44:    With respect to: x_local, x_shared, y_2
INFO - 14:43:44: DOE sampling:   0%|          | 0/5 [00:00<?, ?it]
INFO - 14:43:44: DOE sampling: 100%|██████████| 5/5 [00:00<00:00, 1104.29 it/sec, obj=5.2]
INFO - 14:43:44: Optimization result:
INFO - 14:43:44: Objective value = 4.286541717010372
INFO - 14:43:44: The result is feasible.
INFO - 14:43:44: Status: None
INFO - 14:43:44: Optimizer message: None
INFO - 14:43:44: Number of calls to the objective function by the optimizer: 5
INFO - 14:43:44: Design space:
INFO - 14:43:44: +----------+-------------+--------------------------+-------------+-------+
INFO - 14:43:44: | name     | lower_bound |          value           | upper_bound | type  |
INFO - 14:43:44: +----------+-------------+--------------------------+-------------+-------+
INFO - 14:43:44: | x_local  |      0      | (0.8340440094051481+0j)  |      10     | float |
INFO - 14:43:44: | x_shared |     -10     | (0.15526693601342956+0j) |      10     | float |
INFO - 14:43:44: | x_shared |      0      |  (8.280773877190468+0j)  |      10     | float |
INFO - 14:43:44: | y_2      |     -100    | (-46.177570918278086+0j) |     100     | float |
INFO - 14:43:44: +----------+-------------+--------------------------+-------------+-------+
INFO - 14:43:44: *** DOE Scenario run terminated ***

Create a dataset

We can easily build a dataset from this OptimizationProblem: either by separating the design parameters from the function (default option):

dataset = opt_problem.export_to_dataset("sellar1_doe")
print(dataset)

Out:

sellar1_doe
   Number of samples: 5
   Number of variables: 4
   Variables names and sizes by group:
      design_parameters: x_local (1), x_shared (2), y_2 (1)
      functions: y_1 (1)
   Number of dimensions (total = 5) by group:
      design_parameters: 4
      functions: 1

or by considering all features as default parameters:

dataset = opt_problem.export_to_dataset("sellar1_doe", categorize=False)
print(dataset)

Out:

sellar1_doe
   Number of samples: 5
   Number of variables: 4
   Variables names and sizes by group:
      parameters: x_local (1), x_shared (2), y_2 (1), y_1 (1)
   Number of dimensions (total = 5) by group:
      parameters: 5

or by using an input-output naming rather than an optimization naming:

dataset = opt_problem.export_to_dataset("sellar1_doe", opt_naming=False)
print(dataset)

Out:

sellar1_doe
   Number of samples: 5
   Number of variables: 4
   Variables names and sizes by group:
      inputs: x_local (1), x_shared (2), y_2 (1)
      outputs: y_1 (1)
   Number of dimensions (total = 5) by group:
      inputs: 4
      outputs: 1

Note

Only design variables and functions (objective function, constraints) are stored in the database. If you want to store state variables, you must add them as observables before the problem is executed. Use the add_observable() method.

Access properties

dataset = opt_problem.export_to_dataset("sellar1_doe")

Variables names

We can access the variables names:

print(dataset.variables)

Out:

['x_local', 'x_shared', 'y_1', 'y_2']

Variables sizes

We can access the variables sizes:

print(dataset.sizes)

Out:

{'x_local': 1, 'x_shared': 2, 'y_2': 1, 'y_1': 1}

Variables groups

We can access the variables groups:

print(dataset.groups)

Out:

['design_parameters', 'functions']

Access data

Access by group

We can get the data by group, either as an array (default option):

print(dataset.get_data_by_group("design_parameters"))

Out:

[[ 4.79353495e+00+0.j  5.51246975e+00+0.j  4.83838903e+00+0.j
  -8.79066971e+01+0.j]
 [ 8.34044009e-01+0.j  1.55266936e-01+0.j  8.28077388e+00+0.j
  -4.61775709e+01+0.j]
 [ 6.40890450e+00+0.j -7.11870203e+00+0.j  6.05477519e+00+0.j
   7.40878002e+00+0.j]
 [ 8.83460960e+00+0.j  8.23475931e+00+0.j  2.28749635e-04+0.j
   6.79240596e+01+0.j]
 [ 2.29351178e+00+0.j -5.63064562e+00+0.j  2.37252042e+00+0.j
   4.68187004e+01+0.j]]

or as a dictionary indexed by the variables names:

print(dataset.get_data_by_group("design_parameters", True))

Out:

{'x_local': array([[4.79353495+0.j],
       [0.83404401+0.j],
       [6.4089045 +0.j],
       [8.8346096 +0.j],
       [2.29351178+0.j]]), 'x_shared': array([[ 5.51246975e+00+0.j,  4.83838903e+00+0.j],
       [ 1.55266936e-01+0.j,  8.28077388e+00+0.j],
       [-7.11870203e+00+0.j,  6.05477519e+00+0.j],
       [ 8.23475931e+00+0.j,  2.28749635e-04+0.j],
       [-5.63064562e+00+0.j,  2.37252042e+00+0.j]]), 'y_2': array([[-87.90669709+0.j],
       [-46.17757092+0.j],
       [  7.40878002+0.j],
       [ 67.92405956+0.j],
       [ 46.81870041+0.j]])}

Access by variable name

We can get the data by variables names, either as a dictionary indexed by the variables names (default option):

print(dataset.get_data_by_names(["x_shared", "y_2"]))

Out:

{'x_shared': array([[ 5.51246975e+00+0.j,  4.83838903e+00+0.j],
       [ 1.55266936e-01+0.j,  8.28077388e+00+0.j],
       [-7.11870203e+00+0.j,  6.05477519e+00+0.j],
       [ 8.23475931e+00+0.j,  2.28749635e-04+0.j],
       [-5.63064562e+00+0.j,  2.37252042e+00+0.j]]), 'y_2': array([[-87.90669709+0.j],
       [-46.17757092+0.j],
       [  7.40878002+0.j],
       [ 67.92405956+0.j],
       [ 46.81870041+0.j]])}

or as an array:

print(dataset.get_data_by_names(["x_shared", "y_2"], False))

Out:

[[ 5.51246975e+00+0.j  4.83838903e+00+0.j -8.79066971e+01+0.j]
 [ 1.55266936e-01+0.j  8.28077388e+00+0.j -4.61775709e+01+0.j]
 [-7.11870203e+00+0.j  6.05477519e+00+0.j  7.40878002e+00+0.j]
 [ 8.23475931e+00+0.j  2.28749635e-04+0.j  6.79240596e+01+0.j]
 [-5.63064562e+00+0.j  2.37252042e+00+0.j  4.68187004e+01+0.j]]

Access all data

We can get all the data, either as a large array:

print(dataset.get_all_data())

Out:

({'design_parameters': array([[ 4.79353495e+00+0.j,  5.51246975e+00+0.j,  4.83838903e+00+0.j,
        -8.79066971e+01+0.j],
       [ 8.34044009e-01+0.j,  1.55266936e-01+0.j,  8.28077388e+00+0.j,
        -4.61775709e+01+0.j],
       [ 6.40890450e+00+0.j, -7.11870203e+00+0.j,  6.05477519e+00+0.j,
         7.40878002e+00+0.j],
       [ 8.83460960e+00+0.j,  8.23475931e+00+0.j,  2.28749635e-04+0.j,
         6.79240596e+01+0.j],
       [ 2.29351178e+00+0.j, -5.63064562e+00+0.j,  2.37252042e+00+0.j,
         4.68187004e+01+0.j]]), 'functions': array([[7.589505  ],
       [4.28654172],
       [7.85225077],
       [7.94111374],
       [5.19677421]])}, {'design_parameters': ['x_local', 'x_shared', 'y_2'], 'functions': ['y_1']}, {'x_local': 1, 'x_shared': 2, 'y_2': 1, 'y_1': 1})

or as a dictionary indexed by variables names:

print(dataset.get_all_data(as_dict=True))

Out:

{'design_parameters': {'x_local': array([[4.79353495+0.j],
       [0.83404401+0.j],
       [6.4089045 +0.j],
       [8.8346096 +0.j],
       [2.29351178+0.j]]), 'x_shared': array([[ 5.51246975e+00+0.j,  4.83838903e+00+0.j],
       [ 1.55266936e-01+0.j,  8.28077388e+00+0.j],
       [-7.11870203e+00+0.j,  6.05477519e+00+0.j],
       [ 8.23475931e+00+0.j,  2.28749635e-04+0.j],
       [-5.63064562e+00+0.j,  2.37252042e+00+0.j]]), 'y_2': array([[-87.90669709+0.j],
       [-46.17757092+0.j],
       [  7.40878002+0.j],
       [ 67.92405956+0.j],
       [ 46.81870041+0.j]])}, 'functions': {'y_1': array([[7.589505  ],
       [4.28654172],
       [7.85225077],
       [7.94111374],
       [5.19677421]])}}

We can get these data sorted by category, either with a large array for each category:

print(dataset.get_all_data(by_group=False))

Out:

(array([[ 4.79353495e+00+0.j,  5.51246975e+00+0.j,  4.83838903e+00+0.j,
        -8.79066971e+01+0.j,  7.58950500e+00+0.j],
       [ 8.34044009e-01+0.j,  1.55266936e-01+0.j,  8.28077388e+00+0.j,
        -4.61775709e+01+0.j,  4.28654172e+00+0.j],
       [ 6.40890450e+00+0.j, -7.11870203e+00+0.j,  6.05477519e+00+0.j,
         7.40878002e+00+0.j,  7.85225077e+00+0.j],
       [ 8.83460960e+00+0.j,  8.23475931e+00+0.j,  2.28749635e-04+0.j,
         6.79240596e+01+0.j,  7.94111374e+00+0.j],
       [ 2.29351178e+00+0.j, -5.63064562e+00+0.j,  2.37252042e+00+0.j,
         4.68187004e+01+0.j,  5.19677421e+00+0.j]]), ['x_local', 'x_shared', 'y_2', 'y_1'], {'x_local': 1, 'x_shared': 2, 'y_2': 1, 'y_1': 1})

or with a dictionary of variables names:

print(dataset.get_all_data(by_group=False, as_dict=True))

Out:

{'x_local': array([[4.79353495+0.j],
       [0.83404401+0.j],
       [6.4089045 +0.j],
       [8.8346096 +0.j],
       [2.29351178+0.j]]), 'x_shared': array([[ 5.51246975e+00+0.j,  4.83838903e+00+0.j],
       [ 1.55266936e-01+0.j,  8.28077388e+00+0.j],
       [-7.11870203e+00+0.j,  6.05477519e+00+0.j],
       [ 8.23475931e+00+0.j,  2.28749635e-04+0.j],
       [-5.63064562e+00+0.j,  2.37252042e+00+0.j]]), 'y_2': array([[-87.90669709+0.j],
       [-46.17757092+0.j],
       [  7.40878002+0.j],
       [ 67.92405956+0.j],
       [ 46.81870041+0.j]]), 'y_1': array([[7.589505  ],
       [4.28654172],
       [7.85225077],
       [7.94111374],
       [5.19677421]])}

Total running time of the script: ( 0 minutes 0.031 seconds)

Gallery generated by Sphinx-Gallery