Calibrate or select a machine learning algorithm#
Calibration#
Calibration of a machine learning algorithm.
A machine learning algorithm depends on hyper-parameters, e.g. the number of clusters for a clustering algorithm, the regularization constant for a regression model, the kernel for a Gaussian process regression, ... Its ability to generalize the information learned during the training stage, and thus to avoid over-fitting, which is an over-reliance on the learning data set, depends on the values of these hyper-parameters. Thus, the hyper- parameters minimizing the learning quality measure are rarely those minimizing the generalization one. Classically, the generalization one decreases before growing again as the model becomes more complex, while the learning error keeps decreasing. This phenomenon is called the curse of dimensionality.
In this module, the MLAlgoCalibration
class aims to calibrate the hyper-
parameters in order to minimize this measure of the generalization quality over a
calibration parameter space. This class relies on the MLAlgoAssessor
class
which is a discipline (Discipline
) built from a machine learning algorithm
(BaseMLAlgo
), a dataset (Dataset
), a quality measure
(BaseMLAlgoQuality
) and various options for the data scaling, the quality
measure and the machine learning algorithm. The inputs of this discipline are hyper-
parameters of the machine learning algorithm while the output is the quality criterion.
- class MLAlgoAssessor(algo, dataset, parameters, measure, measure_evaluation_method_name=EvaluationMethod.LEARN, measure_options=mappingproxy({}), transformer=mappingproxy({}), **algo_settings)[source]
Discipline assessing the quality of a machine learning algorithm.
This quality depends on the values of parameters to calibrate with the
MLAlgoCalibration
.Initialize self. See help(type(self)) for accurate signature.
- Parameters:
algo (str) -- The name of a machine learning algorithm.
dataset (Dataset) -- A training dataset.
parameters (Iterable[str]) -- The parameters of the machine learning algorithm to calibrate.
measure (type[BaseMLAlgoQuality]) -- A measure to assess the machine learning algorithm.
measure_evaluation_method_name (BaseMLAlgoQuality.EvaluationMethod) --
The name of the method to evaluate the quality measure.
By default it is set to "LEARN".
measure_options (MeasureOptionsType) --
The options of the quality measure. If "multioutput" is missing, it is added with False as value. If empty, do not use quality measure options.
By default it is set to {}.
transformer (TransformerType) --
The strategies to transform the variables. The values are instances of
BaseTransformer
while the keys are the names of either the variables or the groups of variables, e.g."inputs"
or"outputs"
in the case of the regression algorithms. If a group is specified, theBaseTransformer
will be applied to all the variables of this group. IfIDENTITY
, do not transform the variables.By default it is set to {}.
**algo_settings (MLAlgoSettingsType) -- The settings of the machine learning algorithm.
- Raises:
ValueError -- If the measure option "multioutput" is True.
- algo: str
The name of a machine learning algorithm.
- algos: list[BaseMLAlgo]
The instances of the machine learning algorithm (one per execution of the machine learning algorithm assessor).
- dataset: Dataset
The training dataset.
- measure: type[BaseMLAlgoQuality]
The measure to assess the machine learning algorithm.
- transformer: TransformerType
The transformation strategy for data groups.
- class MLAlgoCalibration(algo, dataset, parameters, calibration_space, measure, measure_evaluation_method_name=EvaluationMethod.LEARN, measure_options=mappingproxy({}), transformer=mappingproxy({}), **algo_settings)[source]
Calibration of a machine learning algorithm.
- Parameters:
algo (str) -- The name of a machine learning algorithm.
dataset (Dataset) -- A training dataset.
parameters (Iterable[str]) -- The parameters of the machine learning algorithm to calibrate.
calibration_space (DesignSpace) -- The space defining the calibration variables.
measure (type[BaseMLAlgoQuality]) -- A measure to assess the machine learning algorithm.
measure_evaluation_method_name (str | BaseMLAlgoQuality.EvaluationMethod) --
The name of the method to evaluate the quality measure.
By default it is set to "LEARN".
measure_options (MeasureOptionsType) --
The options of the quality measure. If empty, do not use the quality measure options.
By default it is set to {}.
transformer (TransformerType) --
The strategies to transform the variables. The values are instances of
BaseTransformer
while the keys are the names of either the variables or the groups of variables, e.g."inputs"
or"outputs"
in the case of the regression algorithms. If a group is specified, theBaseTransformer
will be applied to all the variables of this group. IfIDENTITY
, do not transform the variables.By default it is set to {}.
**algo_settings (MLAlgoSettingsType) -- The settings of the machine learning algorithm.
- execute(algo_name, **algo_settings)[source]
Calibrate the machine learning algorithm from a driver.
The driver can be either a DOE or an optimizer.
- get_history(name)[source]
Return the history of a variable.
- algo_assessor: MLAlgoAssessor
The assessor for the machine learning algorithm.
- property algos: BaseMLAlgo
The trained machine learning algorithms.
- calibration_space: DesignSpace
The space defining the calibration variables.
- maximize_objective: bool
Whether to maximize the quality measure.
- optimal_algorithm: BaseMLAlgo | None
The optimal machine learning algorithm after execution.
- optimal_parameters: dict[str, ndarray] | None
The optimal parameters for the machine learning algorithm after execution.
- scenario: BaseScenario | None
The scenario used to calibrate the machine learning algorithm after execution.
Selection#
This module contains a class to select a machine learning algorithm from a list.
Machine learning is used to find relations or underlying structures in data. There is however no algorithm that is universally better than the others for an arbitrary problem. As for optimization, there is no free lunch for machine learning [Wol96].
Provided a quality measure, one can thus compare the performances of different machine learning algorithms.
This process can be easily performed using the class MLAlgoSelection
.
A machine learning algorithm is built using a set of (hyper)parameters,
before the learning takes place.
In order to choose the best hyperparameters,
a simple grid search over different values may be sufficient.
The MLAlgoSelection
does this.
It can also perform a more advanced form of optimization
than a simple grid search over predefined values,
using the class MLAlgoCalibration
.
See also
ml_algo
calibration
- class MLAlgoSelection(dataset, measure, measure_evaluation_method_name=EvaluationMethod.LEARN, samples=(), **measure_options)[source]
Machine learning algorithm selector.
- Parameters:
dataset (Dataset) -- The training dataset.
measure (str | type[BaseMLAlgoQuality]) -- The name of a quality measure to measure the quality of the machine learning algorithms.
measure_evaluation_method_name (BaseMLAlgoQuality.EvaluationMethod) --
The name of the method to evaluate the quality measure.
By default it is set to "LEARN".
samples (Sequence[int]) --
The indices of the learning samples to consider. Other indices are neither used for training nor for testing. If empty, use all the samples.
By default it is set to ().
**measure_options (MeasureOptionType) -- The options for the method to evaluate the quality measure. The option 'multioutput' will be set to False.
- Raises:
ValueError -- If the unsupported "multioutput" option is enabled.
- add_candidate(name, calib_space=None, calib_algo=mappingproxy({}), **option_lists)[source]
Add a machine learning algorithm candidate.
- Parameters:
name (str) -- The name of a machine learning algorithm.
calib_space (DesignSpace | None) -- The design space defining the parameters to be calibrated with an
MLAlgoCalibration
. IfNone
, do not perform calibration.calib_algo (ScenarioInputDataType) --
The name and the parameters of the optimization algorithm, e.g. {"algo_name": "PYDOE_FULLFACT", "n_samples": 10}. If empty, do not perform calibration.
By default it is set to {}.
**option_lists -- The parameters for the machine learning algorithm candidate. Each parameter has to be enclosed within a list. The list may contain different values to try out for the given parameter, or only one.
- Return type:
None
Examples
>>> selector.add_candidate( >>> "LinearRegressor", >>> penalty_level=[0, 0.1, 1, 10, 20], >>> l2_penalty_ratio=[0, 0.5, 1], >>> fit_intercept=[True], >>> )
- select(return_quality=False)[source]
Select the best model.
The model is chosen through a grid search over candidates and their options, as well as an eventual optimization over the parameters in the calibration space.
- Parameters:
return_quality (bool) --
Whether to return the quality of the best model.
By default it is set to False.
- Returns:
The best model and its quality if required.
- Return type:
- candidates: list[tuple[BaseMLAlgo, float]]
The candidate machine learning algorithms, after possible calibration, and their quality measures.
- dataset: Dataset
The training dataset.
- factory: MLAlgoFactory
The factory used for the instantiation of machine learning algorithms.
- measure: type[BaseMLAlgoQuality]
The name of a quality measure to measure the quality of the machine learning algorithms.