gemseo.mlearning.core.calibration module#

Calibration of a machine learning algorithm.

A machine learning algorithm depends on hyper-parameters, e.g. the number of clusters for a clustering algorithm, the regularization constant for a regression model, the kernel for a Gaussian process regression, ... Its ability to generalize the information learned during the training stage, and thus to avoid over-fitting, which is an over-reliance on the learning data set, depends on the values of these hyper-parameters. Thus, the hyper- parameters minimizing the learning quality measure are rarely those minimizing the generalization one. Classically, the generalization one decreases before growing again as the model becomes more complex, while the learning error keeps decreasing. This phenomenon is called the curse of dimensionality.

In this module, the MLAlgoCalibration class aims to calibrate the hyper- parameters in order to minimize this measure of the generalization quality over a calibration parameter space. This class relies on the MLAlgoAssessor class which is a discipline (Discipline) built from a machine learning algorithm (BaseMLAlgo), a dataset (Dataset), a quality measure (BaseMLAlgoQuality) and various options for the data scaling, the quality measure and the machine learning algorithm. The inputs of this discipline are hyper- parameters of the machine learning algorithm while the output is the quality criterion.

class MLAlgoAssessor(algo, dataset, parameters, measure, measure_evaluation_method_name=EvaluationMethod.LEARN, measure_options=mappingproxy({}), transformer=mappingproxy({}), **algo_settings)[source]#

Bases: Discipline

Discipline assessing the quality of a machine learning algorithm.

This quality depends on the values of parameters to calibrate with the MLAlgoCalibration.

Initialize self. See help(type(self)) for accurate signature.

Parameters:
  • algo (str) -- The name of a machine learning algorithm.

  • dataset (Dataset) -- A learning dataset.

  • parameters (Iterable[str]) -- The parameters of the machine learning algorithm to calibrate.

  • measure (type[BaseMLAlgoQuality]) -- A measure to assess the machine learning algorithm.

  • measure_evaluation_method_name (BaseMLAlgoQuality.EvaluationMethod) --

    The name of the method to evaluate the quality measure.

    By default it is set to "LEARN".

  • measure_options (MeasureOptionsType) --

    The options of the quality measure. If "multioutput" is missing, it is added with False as value. If empty, do not use quality measure options.

    By default it is set to {}.

  • transformer (TransformerType) --

    The strategies to transform the variables. The values are instances of BaseTransformer while the keys are the names of either the variables or the groups of variables, e.g. "inputs" or "outputs" in the case of the regression algorithms. If a group is specified, the BaseTransformer will be applied to all the variables of this group. If IDENTITY, do not transform the variables.

    By default it is set to {}.

  • **algo_settings (MLAlgoSettingsType) -- The settings of the machine learning algorithm.

Raises:

ValueError -- If the measure option "multioutput" is True.

CRITERION = 'criterion'#
LEARNING = 'learning'#
MULTIOUTPUT = 'multioutput'#
algo: str#

The name of a machine learning algorithm.

algos: list[BaseMLAlgo]#

The instances of the machine learning algorithm (one per execution of the machine learning algorithm assessor).

cache: BaseCache | None#

The execution and linearization data saved according to the cache type.

dataset: Dataset#

The learning dataset.

execution_statistics: ExecutionStatistics#

The execution statistics of the process.

execution_status: ExecutionStatus#

The execution status of the process.

jac: JacobianData#

The Jacobian matrices of the outputs.

The structure is {output_name: {input_name: jacobian_matrix}}.

measure: type[BaseMLAlgoQuality]#

The measure to assess the machine learning algorithm.

measure_options: dict[str, int | Dataset]#

The options of the quality measure.

name: str#

The name of the process.

parameters: dict[str, MLAlgoSettingsType]#

The parameters of the machine learning algorithm.

transformer: TransformerType#

The transformation strategy for data groups.

class MLAlgoCalibration(algo, dataset, parameters, calibration_space, measure, measure_evaluation_method_name=EvaluationMethod.LEARN, measure_options=mappingproxy({}), transformer=mappingproxy({}), **algo_settings)[source]#

Bases: object

Calibration of a machine learning algorithm.

Parameters:
  • algo (str) -- The name of a machine learning algorithm.

  • dataset (Dataset) -- A learning dataset.

  • parameters (Iterable[str]) -- The parameters of the machine learning algorithm to calibrate.

  • calibration_space (DesignSpace) -- The space defining the calibration variables.

  • measure (type[BaseMLAlgoQuality]) -- A measure to assess the machine learning algorithm.

  • measure_evaluation_method_name (str | BaseMLAlgoQuality.EvaluationMethod) --

    The name of the method to evaluate the quality measure.

    By default it is set to "LEARN".

  • measure_options (MeasureOptionsType) --

    The options of the quality measure. If empty, do not use the quality measure options.

    By default it is set to {}.

  • transformer (TransformerType) --

    The strategies to transform the variables. The values are instances of BaseTransformer while the keys are the names of either the variables or the groups of variables, e.g. "inputs" or "outputs" in the case of the regression algorithms. If a group is specified, the BaseTransformer will be applied to all the variables of this group. If IDENTITY, do not transform the variables.

    By default it is set to {}.

  • **algo_settings (MLAlgoSettingsType) -- The settings of the machine learning algorithm.

execute(algo_name, **algo_settings)[source]#

Calibrate the machine learning algorithm from a driver.

The driver can be either a DOE or an optimizer.

Parameters:
  • algo_name (str) -- The name of the algorithm.

  • **algo_settings (Any) -- The settings of the algorithm.

Return type:

None

get_history(name)[source]#

Return the history of a variable.

Parameters:

name (str) -- The name of the variable.

Returns:

The history of the variable if the dataset is not empty.

Return type:

ndarray | None

algo_assessor: MLAlgoAssessor#

The assessor for the machine learning algorithm.

property algos: BaseMLAlgo#

The trained machine learning algorithms.

calibration_space: DesignSpace#

The space defining the calibration variables.

dataset: Dataset | None#

The learning dataset after execution.

maximize_objective: bool#

Whether to maximize the quality measure.

optimal_algorithm: BaseMLAlgo | None#

The optimal machine learning algorithm after execution.

optimal_criterion: float | None#

The optimal quality measure after execution.

optimal_parameters: dict[str, ndarray] | None#

The optimal parameters for the machine learning algorithm after execution.

scenario: BaseScenario | None#

The scenario used to calibrate the machine learning algorithm after execution.