# Design of experiments¶

Design of experiments (DOE) is a branch of applied statistics to plan, conduct and analyze real or numerical experiments. It consists in selecting input values in a methodical way (sampling) and then performing the experiments to obtain output values (measurement or evaluation).

Note

“DOE” may also refer to the sampling method itself, e.g. Latin hypercube sampling.

A DOE can be used to:

determine whether an input or an interaction between inputs has an effect on an output (sensitivity analysis),

model the relationship between inputs and outputs (surrogate modeling),

optimize an output with respect to inputs while satisfying some constraints (trade-off).

## API¶

In GEMSEO,
a `DOELibrary`

contains one or several DOE algorithms.

As any `DriverLib`

,
a `DOELibrary`

executes an algorithm from an `OptimizationProblem`

and options.
Most of the DOE algorithms also need the number of samples when calling `execute()`

:

```
>>> from gemseo.algos.doe.lib_pydoe import PyDOE
>>> pydoe_library = PyDOE()
>>> optimization_result = pydoe_library.execute(problem, "lhs", n_samples=100)
```

In the presence of an `OptimizationProblem`

,
it is advisable to apply DOE algorithms with the function `execute_algo()`

which returns an `OptimizationResult`

:

```
>>> from gemseo.api import execute_algo
>>> optimization_result = execute_algo(problem, "lhs", algo_type="doe", n_samples=100)
```

In the presence of an `MDODiscipline`

,
it is advisable to create a `DOEScenario`

with the function `create_scenario()`

and pass the DOE algorithm to `DOEScenario.execute()`

:

```
>>> doe_scenario.execute({"algo": "lhs", "n_samples": 100})
```

## Algorithms¶

GEMSEO wraps different kinds of DOE algorithms from the libraries PyDOE and OpenTURNS.

Note

The names of the algorithms coming from OpenTURNS starts with `"OT_"`

, e.g. `"OT_OPT_LHS"`

.
You need to install the full features of GEMSEO in order to use them.

These DOE algorithms can be classified into categories:

the Monte Carlo sampling generates values in the input space distributed as a multivariate uniform probability distribution with stochastically independent components; the algorithm is

`"OT_MONTE_CARLO"`

,the low-discrepancy sequences are sequences of input values designed to be distributed as uniformly as possible (the deviation from uniform distribution is called

*discrepancy*); the algorithms are`"OT_FAURE"`

,`"OT_HALTON"`

,`"OT_HASELGROVE"`

,`"OT_SOBOL"`

and`"OT_REVERSE_HALTON"`

,the Latin hypercube sampling (LHS) is an algorithm generating \(N\) points in the input space based on the generalization of the Latin square: the range of each input is partitioned into \(N\) equal intervals and, for each interval, one and only one of the points has its corresponding input value inside the interval; the algorithms are

`"lhs"`

,`"OT_LHS"`

and`"OT_LHSC"`

,the optimized LHS is an LHS optimized by Monte Carlo replicates or simulated annealing; the algorithm is

`"OT_OPT_LHS"`

,the stratified DOEs makes the inputs, also called

*factors*, vary by level;a full factorial DOE considers all the possible combinations of these levels across all the inputs; the algorithms are

`"ff2n"`

,`"fullfact"`

and`"OT_FULLFACT"`

;a factorial DOE samples the diagonals of the input space, symmetrically with respect to its center; the algorithm is

`"OT_FACTORIAL"`

;an axial DOE samples the axes of the input space, symmetrically with respect to its center; the algorithm is

`"OT_AXIAL"`

;a central composite DOE combines a factorial and an axial DOEs; the algorithms are

`"OT_COMPOSITE"`

and`"ccdesign"`

;Box–Behnken and Plackett-Burman DOEs for response surface methodology; the algorithms are

`"bbdesign"`

and`"pbdesign"`

.

GEMSEO also offers a `CustomDOE`

to set its own input values,
either as a CSV file or a two-dimensional NumPy array.

## Advanced use¶

Once the functions of the `OptimizationProblem`

have been evaluated,
the input samples can be accessed with `samples`

.

Note

GEMSEO applies a DOE algorithm over a unit hypercube of the same dimension as the input space
and then project the `unit_samples`

onto the input space
using either the probability distributions of the inputs, if the latter are random variables,
or their lower and upper bounds.

If we do not want to evaluate the functions but only obtain the input samples,
we can use the method `compute_doe()`

which returns the samples as a two-dimensional NumPy array.

The quality of the input samples can be assessed with a `DOEQuality`

computing the \(\varphi_p\), minimum-distance and discrepancy criteria.
The smaller these quality measures, the better,
except for the minimum-distance criterion for which the larger it is the better.
The qualities can be compared with logical operations,
with `DOEQuality(doe_1) > DOEQuality(doe_2)`

meaning that `doe_1`

is better than `doe_2`

.

Note

When numerical metrics are not sufficient to compare two input samples sets,
graphical indicators (e.g. `ScatterMatrix`

) could be considered.

Lastly,
a `DOELibrary`

has a `seed`

and each call to `execute()`

increments it.
Thus,
two executions generate two distinct set of input-output samples.
For the sake of reproducibility,
you can pass your own seed to `execute()`

as a DOE option.