calibration module¶
Calibration of a machine learning algorithm.
A machine learning algorithm depends on hyper-parameters, e.g. the number of clusters for a clustering algorithm, the regularization constant for a regression model, the kernel for a Gaussian process regression, … Its ability to generalize the information learned during the training stage, and thus to avoid over-fitting, which is an over-reliance on the learning data set, depends on the values of these hyper-parameters. Thus, the hyper- parameters minimizing the learning quality measure are rarely those minimizing the generalization one. Classically, the generalization one decreases before growing again as the model becomes more complex, while the learning error keeps decreasing. This phenomenon is called the curse of dimensionality.
In this module, the MLAlgoCalibration
class aims to calibrate the hyper-
parameters in order to minimize this measure of the generalization quality over a
calibration parameter space. This class relies on the MLAlgoAssessor
class
which is a discipline (MDODiscipline
) built from a machine learning algorithm
(MLAlgo
), a dataset (Dataset
), a quality measure
(MLQualityMeasure
) and various options for the data scaling, the quality
measure and the machine learning algorithm. The inputs of this discipline are hyper-
parameters of the machine learning algorithm while the output is the quality criterion.
- class gemseo.mlearning.core.calibration.MLAlgoAssessor(algo, dataset, parameters, measure, measure_evaluation_method_name=EvaluationMethod.LEARN, measure_options=None, transformer=mappingproxy({}), **algo_options)[source]
Bases:
MDODiscipline
Discipline assessing the quality of a machine learning algorithm.
This quality depends on the values of parameters to calibrate with the
MLAlgoCalibration
.Initialize self. See help(type(self)) for accurate signature.
- Parameters:
algo (str) – The name of a machine learning algorithm.
dataset (Dataset) – A learning dataset.
parameters (Iterable[str]) – The parameters of the machine learning algorithm to calibrate.
measure (type[MLQualityMeasure]) – A measure to assess the machine learning algorithm.
measure_evaluation_method_name (MLQualityMeasure.EvaluationMethod) –
The name of the method to evaluate the quality measure.
By default it is set to “LEARN”.
measure_options (MeasureOptionsType | None) – The options of the quality measure. If “multioutput” is missing, it is added with False as value. If
None
, do not use quality measure options.transformer (TransformerType) –
The strategies to transform the variables. The values are instances of
Transformer
while the keys are the names of either the variables or the groups of variables, e.g."inputs"
or"outputs"
in the case of the regression algorithms. If a group is specified, theTransformer
will be applied to all the variables of this group. IfIDENTITY
, do not transform the variables.By default it is set to {}.
**algo_options (MLAlgoParameterType) – The options of the machine learning algorithm.
- Raises:
ValueError – If the measure option “multioutput” is True.
- CRITERION = 'criterion'
- LEARNING = 'learning'
- MULTIOUTPUT = 'multioutput'
- algo: str
The name of a machine learning algorithm.
- algos: list[MLAlgo]
The instances of the machine learning algorithm (one per execution of the machine learning algorithm assessor).
- cache: AbstractCache | None
The cache containing one or several executions of the discipline according to the cache policy.
- data_processor: DataProcessor
A tool to pre- and post-process discipline data.
- dataset: Dataset
The learning dataset.
- exec_for_lin: bool
Whether the last execution was due to a linearization.
- input_grammar: BaseGrammar
The input grammar.
- jac: MutableMapping[str, MutableMapping[str, ndarray | csr_array | JacobianOperator]]
The Jacobians of the outputs wrt inputs.
The structure is
{output: {input: matrix}}
.
- measure: MLQualityMeasure
The measure to assess the machine learning algorithm.
- name: str
The name of the discipline.
- output_grammar: BaseGrammar
The output grammar.
- re_exec_policy: ReExecutionPolicy
The policy to re-execute the same discipline.
- residual_variables: dict[str, str]
The output variables mapping to their inputs, to be considered as residuals; they shall be equal to zero.
- run_solves_residuals: bool
Whether the run method shall solve the residuals.
- transformer: TransformerType
The transformation strategy for data groups.
- class gemseo.mlearning.core.calibration.MLAlgoCalibration(algo, dataset, parameters, calibration_space, measure, measure_evaluation_method_name=EvaluationMethod.LEARN, measure_options=None, transformer=mappingproxy({}), **algo_options)[source]
Bases:
object
Calibration of a machine learning algorithm.
- Parameters:
algo (str) – The name of a machine learning algorithm.
dataset (Dataset) – A learning dataset.
parameters (Iterable[str]) – The parameters of the machine learning algorithm to calibrate.
calibration_space (DesignSpace) – The space defining the calibration variables.
measure (MLQualityMeasure) – A measure to assess the machine learning algorithm.
measure_evaluation_method_name (str | MLQualityMeasure.EvaluationMethod) –
The name of the method to evaluate the quality measure.
By default it is set to “LEARN”.
measure_options (MeasureOptionsType | None) – The options of the quality measure. If
None
, do not use the quality measure options.transformer (TransformerType) –
The strategies to transform the variables. The values are instances of
Transformer
while the keys are the names of either the variables or the groups of variables, e.g."inputs"
or"outputs"
in the case of the regression algorithms. If a group is specified, theTransformer
will be applied to all the variables of this group. IfIDENTITY
, do not transform the variables.By default it is set to {}.
**algo_options (MLAlgoParameterType) – The options of the machine learning algorithm.
- execute(input_data)[source]
Calibrate the machine learning algorithm from a driver.
The driver can be either a DOE or an optimizer.
- Parameters:
input_data (ScenarioInputDataType) – The driver properties.
- Return type:
None
- get_history(name)[source]
Return the history of a variable.
- algo_assessor: MLAlgoAssessor
The assessor for the machine learning algorithm.
- property algos: MLAlgo
The trained machine learning algorithms.
- calibration_space: DesignSpace
The space defining the calibration variables.
- dataset: Dataset
The learning dataset.
- maximize_objective: bool
Whether to maximize the quality measure.
- optimal_algorithm: MLAlgo
The optimal machine learning algorithm.
- optimal_criterion: float
The optimal quality measure.
- scenario: Scenario
The scenario used to calibrate the machine learning algorithm.