gemseo / mlearning / core

selection module

This module contains a class to select a machine learning algorithm from a list.

Machine learning is used to find relations or underlying structures in data. There is however no algorithm that is universally better than the others for an arbitrary problem. As for optimization, there is no free lunch for machine learning [wolpert].

wolpert

Wolpert, David H. “The lack of a priori distinctions between learning algorithms.” Neural computation 8.7 (1996): 1341-1390.

Provided a quality measure, one can thus compare the performances of different machine learning algorithms.

This process can be easily performed using the class MLAlgoSelection.

A machine learning algorithm is built using a set of (hyper)parameters, before the learning takes place. In order to choose the best hyperparameters, a simple grid search over different values may be sufficient. The MLAlgoSelection does this. It can also perform a more advanced form of optimization than a simple grid search over predefined values, using the class MLAlgoCalibration.

See also

ml_algo calibration

Classes:

MLAlgoSelection(dataset, measure[, ...])

Machine learning algorithm selector.

class gemseo.mlearning.core.selection.MLAlgoSelection(dataset, measure, eval_method='learn', samples=None, **measure_options)[source]

Bases: object

Machine learning algorithm selector.

dataset

The learning dataset.

Type

Dataset

measure

The name of a quality measure to measure the quality of the machine learning algorithms.

Type

str

measure_options

The options for the method to evaluate the quality measure.

Type

Dict[str,Union[int,Dataset]]

factory

The factory used for the instantiation of machine learning algorithms.

Type

MLAlgoFactory

candidates

The candidate machine learning algorithms, after possible calibration, and their quality measures.

Type

List[Tuple[MLAlgo,float]]

Parameters
  • dataset (Dataset) – The learning dataset.

  • measure (Union[str,MLQualityMeasure]) – The name of a quality measure to measure the quality of the machine learning algorithms.

  • eval_method (str) –

    The name of the method to evaluate the quality measure.

    By default it is set to learn.

  • samples (Optional[Sequence[int]]) –

    The indices of the learning samples to consider. Other indices are neither used for training nor for testing. If None, use all the samples.

    By default it is set to None.

  • **measure_options (MeasureOptionType) – The options for the method to evaluate the quality measure. The option ‘multioutput’ will be set to False.

Raises

ValueError – If the unsupported “multioutput” option is enabled.

Return type

None

Methods:

add_candidate(name[, calib_space, calib_algo])

Add a machine learning algorithm candidate.

select([return_quality])

Select the best model.

add_candidate(name, calib_space=None, calib_algo=None, **option_lists)[source]

Add a machine learning algorithm candidate.

Parameters
  • name (str) – The name of a machine learning algorithm.

  • calib_space (Optional[gemseo.algos.design_space.DesignSpace]) –

    The design space defining the parameters to be calibrated with a MLAlgoCalibration. If None, do not perform calibration.

    By default it is set to None.

  • calib_algo (Optional[Mapping[str, Union[str, int, Mapping[str, Union[int, float]]]]]) –

    The name and the parameters of the optimization algorithm, e.g. {“algo”: “fullfact”, “n_samples”: 10}). If None, do not perform calibration.

    By default it is set to None.

  • **option_lists – The parameters for the machine learning algorithm candidate. Each parameter has to be enclosed within a list. The list may contain different values to try out for the given parameter, or only one.

Return type

None

Examples

>>> selector.add_candidate(
>>>     "LinearRegression",
>>>     penalty_level=[0, 0.1, 1, 10, 20],
>>>     l2_penalty_ratio=[0, 0.5, 1],
>>>     fit_intercept=[True],
>>> )
select(return_quality=False)[source]

Select the best model.

The model is chosen through a grid search over candidates and their options, as well as an eventual optimization over the parameters in the calibration space.

Parameters

return_quality (bool) –

Whether to return the quality of the best model.

By default it is set to False.

Returns

The best model and its quality if required.

Return type

Union[gemseo.mlearning.core.ml_algo.MLAlgo, Tuple[gemseo.mlearning.core.ml_algo.MLAlgo, float]]