gemseo.mlearning.resampling.base_resampler module#

A base class for resampling and surrogate modeling.

class BaseResampler(sample_indices, n_splits, seed=0)[source]#

Bases: object

A base class for resampling and surrogate modeling.

Parameters:
  • sample_indices (NDArray[int]) -- The original indices of the samples.

  • n_splits (int) -- The number of train-test splits.

  • seed (int | None) --

    The seed to initialize the random generator. If None, then fresh, unpredictable entropy will be pulled from the OS.

    By default it is set to 0.

execute(model, return_models=False, input_data=None, stack_predictions=True, fit_transformers=True, store_sampling_result=False)[source]#

Apply the resampling technique to a machine learning model.

Parameters:
  • model (BaseMLAlgo) -- The machine learning model.

  • return_models (bool) --

    Whether the sub-models resulting from resampling are returned.

    By default it is set to False.

  • input_data (ndarray | None) -- The input data for the prediction, if any.

  • stack_predictions (bool) --

    Whether the sub-predictions are stacked per sub-model (first the predictions of the first sub-model, then the prediction of the second sub-model, etc.). This argument is ignored when input_data is None.

    By default it is set to True.

  • fit_transformers (bool) --

    Whether to re-fit the transformers.

    By default it is set to True.

  • store_sampling_result (bool) --

    Whether to store the sampling results in the attribute resampling_results of the original model.

    By default it is set to False.

Returns:

First the sub-models resulting from resampling if return_models is True then the predictions, either per fold or stacked.

Raises:

ValueError -- When the model is neither a supervised algorithm nor a clustering one.

Return type:

tuple[list[BaseMLAlgo], list[ndarray] | ndarray]

plot(file_path='', show=True, colors=('b', 'r'))[source]#

Plot the train-test splits.

Parameters:
  • file_path (str | Path) --

    The file path to save the figure. If empty, do not save the figure.

    By default it is set to "".

  • show (bool) --

    Whether to display the figure.

    By default it is set to True.

  • colors (tuple[str, str]) --

    The colors for training and test points.

    By default it is set to ('b', 'r').

Returns:

The visualization.

Return type:

Scatter

name: str#

The name of the resampler.

Use the class name by default.

property sample_indices: NDArray[int]#

The indices of the samples after shuffling.

property seed: int#

The seed to initialize the random generator.

property splits: Splits#

The train-test splits resulting from the splitting of the samples.

A train-test split is a partition whose first component contains the indices of the learning samples and the second one the indices of the test samples.