surrogate module¶

Surrogate discipline.

class gemseo.disciplines.surrogate.SurrogateDiscipline(surrogate, data=None, transformer=mappingproxy({'inputs': <gemseo.mlearning.transformers.scaler.min_max_scaler.MinMaxScaler object>, 'outputs': <gemseo.mlearning.transformers.scaler.min_max_scaler.MinMaxScaler object>}), disc_name=None, default_inputs=None, input_names=None, output_names=None, **parameters)[source]¶

Bases: MDODiscipline

A discipline wrapping a regression model built from a dataset.

Examples

>>> import numpy as np
>>> from gemseo.datasets.io_dataset import IODataset
>>> from gemseo.disciplines.surrogate import SurrogateDiscipline
>>>
>>> # Create an input-output dataset.
>>> dataset = IODataset()
>>> dataset.add_input_variable("x", np.array([[1.0], [2.0], [3.0]]))
>>> dataset.add_output_variable("y", np.array([[3.0], [5.0], [6.0]]))
>>>
>>> # Build a surrogate discipline relying on a linear regression model.
>>> surrogate_discipline = SurrogateDiscipline("LinearRegressor", dataset)
>>>
>>> # Assess its quality with the R2 measure.
>>> r2 = surrogate_discipline.get_error_measure("R2Measure")
>>> learning_r2 = r2.evaluate_learn()
>>>
>>> # Execute the surrogate discipline, with default or custom input values.
>>> surrogate_discipline.execute()
>>> surrogate_discipline.execute({"x": np.array([1.5])})

Initialize self. See help(type(self)) for accurate signature.

Parameters:

surrogate (str | BaseMLRegressionAlgo) – Either the name of a class deriving from BaseMLRegressionAlgo or the instance of an BaseMLRegressionAlgo.
data (IODataset | None) – The learning dataset to train the regression model. If None, the regression model is supposed to be trained.
transformer (TransformerType) –
The strategies to transform the variables. The values are instances of BaseTransformer while the keys are the names of either the variables or the groups of variables, e.g. "inputs" or "outputs" in the case of the regression algorithms. If a group is specified, the BaseTransformer will be applied to all the variables of this group. If The :attr:.BaseMLRegressionAlgo.DEFAULT_TRANSFORMER` uses the MinMaxScaler strategy for both input and output variables.

By default it is set to {‘inputs’: <gemseo.mlearning.transformers.scaler.min_max_scaler.MinMaxScaler object at 0x7f1dec69e040>, ‘outputs’: <gemseo.mlearning.transformers.scaler.min_max_scaler.MinMaxScaler object at 0x7f1dec69e0d0>}.
disc_name (str | None) – The name to be given to the surrogate discipline. If None, concatenate SHORT_ALGO_NAME and data.name.
default_inputs (dict[str, ndarray] | None) – The default values of the inputs. If None, use the center of the learning input space.
input_names (Iterable[str] | None) – The names of the input variables. If None, consider all input variables mentioned in the learning dataset.
output_names (Iterable[str] | None) – The names of the output variables. If None, consider all input variables mentioned in the learning dataset.
**parameters (MLAlgoParameterType) – The parameters of the machine learning algorithm.

Raises:

ValueError – If the learning dataset is missing whilst the regression model is not trained.

class ApproximationMode(value)¶

Bases: StrEnum

The approximation derivation modes.

CENTERED_DIFFERENCES = 'centered_differences'¶: The centered differences method used to approximate the Jacobians by perturbing each variable with a small real number.

COMPLEX_STEP = 'complex_step'¶: The complex step method used to approximate the Jacobians by perturbing each variable with a small complex number.

FINITE_DIFFERENCES = 'finite_differences'¶: The finite differences method used to approximate the Jacobians by perturbing each variable with a small real number.

class CacheType(value)¶

Bases: StrEnum

The name of the cache class.

HDF5 = 'HDF5Cache'¶

MEMORY_FULL = 'MemoryFullCache'¶

NONE = ''¶: No cache is used.

SIMPLE = 'SimpleCache'¶

class ExecutionStatus(value)¶

Bases: StrEnum

The execution statuses of a discipline.

DONE = 'DONE'¶

FAILED = 'FAILED'¶

LINEARIZE = 'LINEARIZE'¶

PENDING = 'PENDING'¶

RUNNING = 'RUNNING'¶

VIRTUAL = 'VIRTUAL'¶

class GrammarType(value)¶

Bases: StrEnum

The name of the grammar class.

JSON = 'JSONGrammar'¶

PYDANTIC = 'PydanticGrammar'¶

SIMPLE = 'SimpleGrammar'¶

SIMPLER = 'SimplerGrammar'¶

class InitJacobianType(value)¶

Bases: StrEnum

The way to initialize Jacobian matrices.

DENSE = 'dense'¶: The Jacobian is initialized as a NumPy ndarray filled in with zeros.

EMPTY = 'empty'¶: The Jacobian is initialized as an empty NumPy ndarray.

SPARSE = 'sparse'¶: The Jacobian is initialized as a SciPy CSR array with zero elements.

class LinearizationMode(value)¶

Bases: StrEnum

An enumeration.

ADJOINT = 'adjoint'¶

AUTO = 'auto'¶

CENTERED_DIFFERENCES = 'centered_differences'¶

COMPLEX_STEP = 'complex_step'¶

DIRECT = 'direct'¶

FINITE_DIFFERENCES = 'finite_differences'¶

REVERSE = 'reverse'¶

class ReExecutionPolicy(value)¶

Bases: StrEnum

The re-execution policy of a discipline.

DONE = 'RE_EXEC_DONE'¶

NEVER = 'RE_EXEC_NEVER'¶

classmethod activate_time_stamps()¶

Activate the time stamps.

For storing start and end times of execution and linearizations.

Return type:: None

add_differentiated_inputs(inputs=None)¶

Add the inputs for differentiation.

The inputs that do not represent continuous numbers are filtered out.

Parameters:: inputs (Iterable[str] | None) – The input variables against which to differentiate the outputs. If None, all the inputs of the discipline are used.
Raises:: ValueError – When ``inputs `` are not in the input grammar.
Return type:: None

add_differentiated_outputs(outputs=None)¶

Add the outputs for differentiation.

The outputs that do not represent continuous numbers are filtered out.

Parameters:: outputs (Iterable[str] | None) – The output variables to be differentiated. If None, all the outputs of the discipline are used.
Raises:: ValueError – When ``outputs `` are not in the output grammar.
Return type:: None

add_namespace_to_input(name, namespace)¶

Add a namespace prefix to an existing input grammar element.

The updated input grammar element name will be namespace + namespaces_separator + name.

Parameters:

name (str) – The element name to rename.
namespace (str) – The name of the namespace.

Return type:

None

add_namespace_to_output(name, namespace)¶

Add a namespace prefix to an existing output grammar element.

The updated output grammar element name will be namespace + namespaces_separator + name.

Parameters:

name (str) – The element name to rename.
namespace (str) – The name of the namespace.

Return type:

None

add_status_observer(obs)¶

Add an observer for the status.

Add an observer for the status to be notified when self changes of status.

Parameters:: obs (Any) – The observer to add.
Return type:: None

auto_get_grammar_file(is_input=True, name=None, comp_dir=None)¶

Use a naming convention to associate a grammar file to the discipline.

Search in the directory comp_dir for either an input grammar file named name + "_input.json" or an output grammar file named name + "_output.json".

Parameters:

is_input (bool) –
Whether to search for an input or output grammar file.

By default it is set to True.
name (str | None) – The name to be searched in the file names. If None, use the name of the discipline class.
comp_dir (str | Path | None) – The directory in which to search the grammar file. If None, use the GRAMMAR_DIRECTORY if any, or the directory of the discipline class module.

Returns:

The grammar file path.

Return type:

Path

check_input_data(input_data, raise_exception=True)¶

Check the input data validity.

Parameters:

input_data (Mapping[str, Any]) – The input data needed to execute the discipline according to the discipline input grammar.
raise_exception (bool) –
Whether to raise on error.

By default it is set to True.

Return type:

None

check_jacobian(input_data=None, derr_approx=ApproximationMode.FINITE_DIFFERENCES, step=1e-07, threshold=1e-08, linearization_mode='auto', inputs=None, outputs=None, parallel=False, n_processes=2, use_threading=False, wait_time_between_fork=0, auto_set_step=False, plot_result=False, file_path='jacobian_errors.pdf', show=False, fig_size_x=10, fig_size_y=10, reference_jacobian_path=None, save_reference_jacobian=False, indices=None)¶

Check if the analytical Jacobian is correct with respect to a reference one.

If reference_jacobian_path is not None and save_reference_jacobian is True, compute the reference Jacobian with the approximation method and save it in reference_jacobian_path.

If reference_jacobian_path is not None and save_reference_jacobian is False, do not compute the reference Jacobian but read it from reference_jacobian_path.

If reference_jacobian_path is None, compute the reference Jacobian without saving it.

Parameters:

input_data (Mapping[str, ndarray] | None) – The input data needed to execute the discipline according to the discipline input grammar. If None, use the MDODiscipline.default_inputs.
derr_approx (ApproximationMode) –
The approximation method, either “complex_step” or “finite_differences”.

By default it is set to “finite_differences”.
threshold (float) –
The acceptance threshold for the Jacobian error.

By default it is set to 1e-08.
linearization_mode (str) –
the mode of linearization: direct, adjoint or automated switch depending on dimensions of inputs and outputs (Default value = ‘auto’)

By default it is set to “auto”.
inputs (Iterable[str] | None) – The names of the inputs wrt which to differentiate the outputs.
outputs (Iterable[str] | None) – The names of the outputs to be differentiated.
step (float) –
The differentiation step.

By default it is set to 1e-07.
parallel (bool) –
Whether to differentiate the discipline in parallel.

By default it is set to False.
n_processes (int) –
The maximum simultaneous number of threads, if use_threading is True, or processes otherwise, used to parallelize the execution.

By default it is set to 2.
use_threading (bool) –
Whether to use threads instead of processes to parallelize the execution; multiprocessing will copy (serialize) all the disciplines, while threading will share all the memory This is important to note if you want to execute the same discipline multiple times, you shall use multiprocessing.

By default it is set to False.
wait_time_between_fork (float) –
The time waited between two forks of the process / thread.

By default it is set to 0.
auto_set_step (bool) –
Whether to compute the optimal step for a forward first order finite differences gradient approximation.

By default it is set to False.
plot_result (bool) –
Whether to plot the result of the validation (computed vs approximated Jacobians).

By default it is set to False.
file_path (str | Path) –
The path to the output file if plot_result is True.

By default it is set to “jacobian_errors.pdf”.
show (bool) –
Whether to open the figure.

By default it is set to False.
fig_size_x (float) –
The x-size of the figure in inches.

By default it is set to 10.
fig_size_y (float) –
The y-size of the figure in inches.

By default it is set to 10.
reference_jacobian_path (str | Path | None) – The path of the reference Jacobian file.
save_reference_jacobian (bool) –
Whether to save the reference Jacobian.

By default it is set to False.
indices (Iterable[int] | None) – The indices of the inputs and outputs for the different sub-Jacobian matrices, formatted as {variable_name: variable_components} where variable_components can be either an integer, e.g. 2 a sequence of integers, e.g. [0, 3], a slice, e.g. slice(0,3), the ellipsis symbol (…) or None, which is the same as ellipsis. If a variable name is missing, consider all its components. If None, consider all the components of all the inputs and outputs.

Returns:

Whether the analytical Jacobian is correct with respect to the reference one.

Return type:

bool

check_output_data(raise_exception=True)¶

Check the output data validity.

Parameters:

raise_exception (bool) –

Whether to raise an exception when the data is invalid.

By default it is set to True.

Return type:

None

classmethod deactivate_time_stamps()¶

Deactivate the time stamps.

For storing start and end times of execution and linearizations.

Return type:: None

execute(input_data=None)¶

Execute the discipline.

This method executes the discipline:

Adds the default inputs to the input_data if some inputs are not defined in input_data but exist in MDODiscipline.default_inputs.
Checks whether the last execution of the discipline was called with identical inputs, i.e. cached in MDODiscipline.cache; if so, directly returns self.cache.get_output_cache(inputs).
Caches the inputs.
Checks the input data against MDODiscipline.input_grammar.
If MDODiscipline.data_processor is not None, runs the preprocessor.
Updates the status to MDODiscipline.ExecutionStatus.RUNNING.
Calls the MDODiscipline._run() method, that shall be defined.
If MDODiscipline.data_processor is not None, runs the postprocessor.
Checks the output data.
Caches the outputs.
Updates the status to MDODiscipline.ExecutionStatus.DONE or MDODiscipline.ExecutionStatus.FAILED.
Updates summed execution time.

Parameters:: input_data (Mapping[str, Any] | None) – The input data needed to execute the discipline according to the discipline input grammar. If None, use the MDODiscipline.default_inputs.
Returns:: The discipline local data after execution.
Return type:: DisciplineData

static from_pickle(file_path)¶

Deserialize a discipline from a file.

Parameters:: file_path (str | Path) – The path to the file containing the discipline.
Returns:: The discipline instance.
Return type:: MDODiscipline

get_all_inputs()¶

Return the local input data.

The order is given by MDODiscipline.get_input_data_names().

Returns:: The local input data.
Return type:: Iterator[Any]

get_all_outputs()¶

Return the local output data.

The order is given by MDODiscipline.get_output_data_names().

Returns:: The local output data.
Return type:: Iterator[Any]

static get_data_list_from_dict(keys, data_dict)¶

Filter the dict from a list of keys or a single key.

If keys is a string, then the method return the value associated to the key. If keys is a list of strings, then the method returns a generator of value corresponding to the keys which can be iterated.

Parameters: