Design of experiments¶

Design of experiments (DOE) is a branch of applied statistics to plan, conduct and analyze real or numerical experiments. It consists in selecting input values in a methodical way (sampling) and then performing the experiments to obtain output values (measurement or evaluation).

Note

“DOE” may also refer to the sampling method itself, e.g. Latin hypercube sampling.

A DOE can be used to:

determine whether an input or an interaction between inputs has an effect on an output (sensitivity analysis),
model the relationship between inputs and outputs (surrogate modeling),
optimize an output with respect to inputs while satisfying some constraints (trade-off).

API¶

In GEMSEO, a DOELibrary contains one or several DOE algorithms.

As any DriverLibrary, a DOELibrary executes an algorithm from an OptimizationProblem and options. Most of the DOE algorithms also need the number of samples when calling execute():

>>> from gemseo.algos.doe.lib_pydoe import PyDOE
>>> pydoe_library = PyDOE()
>>> optimization_result = pydoe_library.execute(problem, "lhs", n_samples=100)

In the presence of an OptimizationProblem, it is advisable to apply DOE algorithms with the function execute_algo() which returns an OptimizationResult:

>>> from gemseo import execute_algo
>>> optimization_result = execute_algo(problem, "lhs", algo_type="doe", n_samples=100)

In the presence of an MDODiscipline, it is advisable to create a DOEScenario with the function create_scenario() and pass the DOE algorithm to DOEScenario.execute():

>>> doe_scenario.execute({"algo": "lhs", "n_samples": 100})

Algorithms¶

GEMSEO wraps different kinds of DOE algorithms from the libraries PyDOE and OpenTURNS.

Note

The names of the algorithms coming from OpenTURNS starts with "OT_", e.g. "OT_OPT_LHS". You need to install the full features of GEMSEO in order to use them.

Advanced use¶

Once the functions of the OptimizationProblem have been evaluated, the input samples can be accessed with samples.

Note

GEMSEO applies a DOE algorithm over a unit hypercube of the same dimension as the input space and then project the unit_samples onto the input space using either the probability distributions of the inputs, if the latter are random variables, or their lower and upper bounds.

If we do not want to evaluate the functions but only obtain the input samples, we can use the method compute_doe() which returns the samples as a two-dimensional NumPy array.

The quality of the input samples can be assessed with a DOEQuality computing the \(\varphi_p\), minimum-distance and discrepancy criteria. The smaller these quality measures, the better, except for the minimum-distance criterion for which the larger it is the better. The qualities can be compared with logical operations, with DOEQuality(doe_1) > DOEQuality(doe_2) meaning that doe_1 is better than doe_2.

Note

When numerical metrics are not sufficient to compare two input samples sets, graphical indicators (e.g. ScatterMatrix) could be considered.

Lastly, a DOELibrary has a seed initialized at 0 and each call to execute() increments it before using it. Thus, two executions generate two distinct set of input-output samples. For the sake of reproducibility, you can pass your own seed to execute() as a DOE option.