calibration module¶

Calibration of a machine learning algorithm.

A machine learning algorithm depends on hyper-parameters, e.g. the number of clusters for a clustering algorithm, the regularization constant for a regression model, the kernel for a Gaussian process regression, … Its ability to generalize the information learned during the training stage, and thus to avoid over-fitting, which is an over-reliance on the learning data set, depends on the values of these hyper-parameters. Thus, the hyper- parameters minimizing the learning quality measure are rarely those minimizing the generalization one. Classically, the generalization one decreases before growing again as the model becomes more complex, while the learning error keeps decreasing. This phenomenon is called the curse of dimensionality.

In this module, the MLAlgoCalibration class aims to calibrate the hyper- parameters in order to minimize this measure of the generalization quality over a calibration parameter space. This class relies on the MLAlgoAssessor class which is a discipline (MDODiscipline) built from a machine learning algorithm (MLAlgo), a dataset (Dataset), a quality measure (MLQualityMeasure) and various options for the data scaling, the quality measure and the machine learning algorithm. The inputs of this discipline are hyper- parameters of the machine learning algorithm while the output is the quality criterion.

class gemseo.mlearning.core.calibration.MLAlgoAssessor(algo, dataset, parameters, measure, measure_evaluation_method_name=EvaluationMethod.LEARN, measure_options=None, transformer=mappingproxy({}), **algo_options)[source]

Bases: MDODiscipline

Discipline assessing the quality of a machine learning algorithm.

This quality depends on the values of parameters to calibrate with the MLAlgoCalibration.

Initialize self. See help(type(self)) for accurate signature.

Parameters:

algo (str) – The name of a machine learning algorithm.
dataset (Dataset) – A learning dataset.
parameters (Iterable[str]) – The parameters of the machine learning algorithm to calibrate.
measure (type[MLQualityMeasure]) – A measure to assess the machine learning algorithm.
measure_evaluation_method_name (MLQualityMeasure.EvaluationMethod) –
The name of the method to evaluate the quality measure.

By default it is set to “LEARN”.
measure_options (MeasureOptionsType | None) – The options of the quality measure. If “multioutput” is missing, it is added with False as value. If None, do not use quality measure options.
transformer (TransformerType) –
The strategies to transform the variables. The values are instances of Transformer while the keys are the names of either the variables or the groups of variables, e.g. "inputs" or "outputs" in the case of the regression algorithms. If a group is specified, the Transformer will be applied to all the variables of this group. If IDENTITY, do not transform the variables.

By default it is set to {}.
**algo_options (MLAlgoParameterType) – The options of the machine learning algorithm.

Raises:

ValueError – If the measure option “multioutput” is True.

CRITERION = 'criterion'

LEARNING = 'learning'

MULTIOUTPUT = 'multioutput'

algo: str: The name of a machine learning algorithm.

algos: list[MLAlgo]: The instances of the machine learning algorithm (one per execution of the machine learning algorithm assessor).

cache: AbstractCache | None: The cache containing one or several executions of the discipline according to the cache policy.

data_processor: DataProcessor: A tool to pre- and post-process discipline data.

dataset: Dataset: The learning dataset.

exec_for_lin: bool: Whether the last execution was due to a linearization.

input_grammar: BaseGrammar: The input grammar.

jac: dict[str, dict[str, ndarray]]

The Jacobians of the outputs wrt inputs.

The structure is {output: {input: matrix}}.

measure: MLQualityMeasure: The measure to assess the machine learning algorithm.

measure_options: dict[str, int | Dataset]: The options of the quality measure.

name: str: The name of the discipline.

output_grammar: BaseGrammar: The output grammar.

parameters: list[str]: The parameters of the machine learning algorithm.

re_exec_policy: ReExecutionPolicy: The policy to re-execute the same discipline.

residual_variables: Mapping[str, str]: The output variables mapping to their inputs, to be considered as residuals; they shall be equal to zero.

run_solves_residuals: bool: Whether the run method shall solve the residuals.

transformer: TransformerType: The transformation strategy for data groups.

class gemseo.mlearning.core.calibration.MLAlgoCalibration(algo, dataset, parameters, calibration_space, measure, measure_evaluation_method_name=EvaluationMethod.LEARN, measure_options=None, transformer=mappingproxy({}), **algo_options)[source]

Bases: object

Calibration of a machine learning algorithm.

Parameters:

algo (str) – The name of a machine learning algorithm.
dataset (Dataset) – A learning dataset.
parameters (Iterable[str]) – The parameters of the machine learning algorithm to calibrate.
calibration_space (DesignSpace) – The space defining the calibration variables.
measure (MLQualityMeasure) – A measure to assess the machine learning algorithm.
measure_evaluation_method_name (str | MLQualityMeasure.EvaluationMethod) –
The name of the method to evaluate the quality measure.

By default it is set to “LEARN”.
measure_options (MeasureOptionsType | None) – The options of the quality measure. If None, do not use the quality measure options.
transformer (TransformerType) –
The strategies to transform the variables. The values are instances of Transformer while the keys are the names of either the variables or the groups of variables, e.g. "inputs" or "outputs" in the case of the regression algorithms. If a group is specified, the Transformer will be applied to all the variables of this group. If IDENTITY, do not transform the variables.

By default it is set to {}.
**algo_options (MLAlgoParameterType) – The options of the machine learning algorithm.

execute(input_data)[source]

Calibrate the machine learning algorithm from a driver.

The driver can be either a DOE or an optimizer.

Parameters:: input_data (Mapping[str, str | int | Mapping[str, int | float]]) – The driver properties.
Return type:: None

get_history(name)[source]

Return the history of a variable.

Parameters:: name (str) – The name of the variable.
Returns:: The history of the variable.
Return type:: ndarray

algo_assessor: MLAlgoAssessor: The assessor for the machine learning algorithm.

property algos: MLAlgo: The trained machine learning algorithms.

calibration_space: DesignSpace: The space defining the calibration variables.

dataset: Dataset: The learning dataset.

maximize_objective: bool: Whether to maximize the quality measure.

optimal_algorithm: MLAlgo: The optimal machine learning algorithm.

optimal_criterion: float: The optimal quality measure.

optimal_parameters: dict[str, numpy.ndarray]: The optimal parameters for the machine learning algorithm.

scenario: Scenario: The scenario used to calibrate the machine learning algorithm.

Examples using MLAlgoCalibration¶

Calibration of a polynomial regression