Transform data to improve the ML algorithm quality

Introduction

A transformer to apply operations on NumPy arrays.

The abstract Transformer class implements the concept of a data transformer. Inheriting classes shall implement the Transformer.fit(), Transformer.transform() and possibly Transformer.inverse_transform() methods.

class gemseo.mlearning.transform.transformer.Transformer(name='Transformer', **parameters)[source]

A data transformer fitted from some samples.

Parameters:
  • name (str) –

    A name for this transformer.

    By default it is set to “Transformer”.

  • **parameters (ParameterType) – The parameters of the transformer.

compute_jacobian(data)[source]

Compute the Jacobian of transform().

Parameters:

data (ndarray) – The data where the Jacobian is to be computed, shaped as (n_observations, n_features) or (n_features, ).

Returns:

The Jacobian matrix, shaped according to data.

Return type:

NoReturn

compute_jacobian_inverse(data)[source]

Compute the Jacobian of the inverse_transform().

Parameters:

data (ndarray) – The data where the Jacobian is to be computed, shaped as (n_observations, n_features) or (n_features, ).

Returns:

The Jacobian matrix, shaped according to data..

Return type:

NoReturn

duplicate()[source]

Duplicate the current object.

Returns:

A deepcopy of the current instance.

Return type:

Transformer

fit(data, *args)[source]

Fit the transformer to the data.

Parameters:
  • data (ndarray) – The data to be fitted, shaped as (n_observations, n_features) or (n_observations, ).

  • args (Union[float, int, str]) –

Return type:

None

fit_transform(data, *args)[source]

Fit the transformer to the data and transform the data.

Parameters:
  • data (ndarray) – The data to be transformed, shaped as (n_observations, n_features) or (n_observations, ).

  • args (Union[float, int, str]) –

Returns:

The transformed data, shaped as data.

Return type:

ndarray

inverse_transform(data)[source]

Perform an inverse transform on the data.

Parameters:

data (ndarray) – The data to be inverse transformed, shaped as (n_observations, n_features) or (n_features, ).

Returns:

The inverse transformed data, shaped as data.

Return type:

NoReturn

abstract transform(data)[source]

Transform the data.

Parameters:

data (ndarray) – The data to be transformed, shaped as (n_observations, n_features) or (n_features, ).

Returns:

The transformed data, shaped as data.

Return type:

ndarray

CROSSED: ClassVar[bool] = False

Whether the fit() method requires two data arrays.

property is_fitted: bool

Whether the transformer has been fitted from some data.

name: str

The name of the transformer.

property parameters: dict[str, Union[bool, int, float, numpy.ndarray, str, NoneType]]

The parameters of the transformer.

class gemseo.mlearning.transform.transformer.TransformerFactory(*args, **kwargs)[source]

A factory of Transformer.

Parameters:
  • base_class – The base class to be considered.

  • module_names – The fully qualified modules names to be searched.

  • args (Any) –

  • kwargs (Any) –

static cache_clear()

Clear the cache.

Return type:

None

create(class_name, **options)

Return an instance of a class.

Parameters:
  • class_name (str) – The name of the class.

  • **options (Any) – The arguments to be passed to the class constructor.

Returns:

The instance of the class.

Raises:

TypeError – If the class cannot be instantiated.

Return type:

Any

get_class(name)

Return a class from its name.

Parameters:

name (str) – The name of the class.

Returns:

The class.

Raises:

ImportError – If the class is not available.

Return type:

type[Any]

get_default_options_values(name)

Return the constructor kwargs default values of a class.

Parameters:

name (str) – The name of the class.

Returns:

The mapping from the argument names to their default values.

Return type:

dict[str, str | int | float | bool]

get_default_sub_options_values(name, **options)

Return the default values of the sub options of a class.

Parameters:
  • name (str) – The name of the class.

  • **options (str) – The options to be passed to the class required to deduce the sub options.

Returns:

The JSON grammar.

Return type:

JSONGrammar

get_library_name(name)

Return the name of the library related to the name of a class.

Parameters:

name (str) – The name of the class.

Returns:

The name of the library.

Return type:

str

get_options_doc(name)

Return the constructor documentation of a class.

Parameters:

name (str) – The name of the class.

Returns:

The mapping from the argument names to their documentation.

Return type:

dict[str, str]

get_options_grammar(name, write_schema=False, schema_path=None)

Return the options JSON grammar for a class.

Attempt to generate a JSONGrammar from the arguments of the __init__ method of the class.

Parameters:
  • name (str) – The name of the class.

  • write_schema (bool) –

    If True, write the JSON schema to a file.

    By default it is set to False.

  • schema_path (str | None) – The path to the JSON schema file. If None, the file is saved in the current directory in a file named after the name of the class.

Returns:

The JSON grammar.

Return type:

JSONGrammar

get_sub_options_grammar(name, **options)

Return the JSONGrammar of the sub options of a class.

Parameters:
  • name (str) – The name of the class.

  • **options (str) – The options to be passed to the class required to deduce the sub options.

Returns:

The JSON grammar.

Return type:

JSONGrammar

is_available(name)

Return whether a class can be instantiated.

Parameters:

name (str) – The name of the class.

Returns:

Whether the class can be instantiated.

Return type:

bool

update()

Search for the classes that can be instantiated.

The search is done in the following order:
  1. The fully qualified module names

  2. The plugin packages

  3. The packages from the environment variables

Return type:

None

property classes: list[str]

The sorted names of the available classes.

A pipeline to chain transformers.

The Pipeline class chains a sequence of tranformers, and provides global fit(), transform(), fit_transform() and inverse_transform() methods.

class gemseo.mlearning.transform.pipeline.Pipeline(name='Pipeline', transformers=None)[source]

Transformer pipeline.

Parameters:
  • name (str) –

    A name for this pipeline.

    By default it is set to “Pipeline”.

  • transformers (Sequence[Transformer] | None) – A sequence of transformers to be chained. The transformers are chained in the order of appearance in the list, i.e. the first transformer is applied first. If transformers is an empty list or None, then the pipeline transformer behaves like an identity transformer.

compute_jacobian(data)[source]

Compute the Jacobian of the pipeline.transform().

Parameters:

data (ndarray) – The data where the Jacobian is to be computed.

Returns:

The Jacobian matrix.

Return type:

ndarray

compute_jacobian_inverse(data)[source]

Compute the Jacobian of the pipeline.inverse_transform().

Parameters:

data (ndarray) – The data where the Jacobian is to be computed.

Returns:

The Jacobian matrix.

Return type:

ndarray

duplicate()[source]

Duplicate the current object.

Returns:

A deepcopy of the current instance.

Return type:

Pipeline

fit(data, *args)

Fit the transformer to the data.

Parameters:
  • data (ndarray) – The data to be fitted, shaped as (n_observations, n_features) or (n_observations, ).

  • args (Union[float, int, str]) –

Return type:

None

fit_transform(data, *args)

Fit the transformer to the data and transform the data.

Parameters:
  • data (ndarray) – The data to be transformed, shaped as (n_observations, n_features) or (n_observations, ).

  • args (Union[float, int, str]) –

Returns:

The transformed data, shaped as data.

Return type:

ndarray

inverse_transform(data)[source]

Perform an inverse transform on the data.

The data is inverse transformed sequentially, starting with the last transformer in the list.

Parameters:

data (ndarray) – The data to be inverse transformed.

Returns:

The inverse transformed data.

Return type:

ndarray

transform(data)[source]

Transform the data.

The data is transformed sequentially, where the output of one transformer is the input of the next.

Parameters:

data (ndarray) – The data to be transformed.

Returns:

The transformed data.

Return type:

ndarray

CROSSED: ClassVar[bool] = False

Whether the fit() method requires two data arrays.

property is_fitted: bool

Whether the transformer has been fitted from some data.

name: str

The name of the transformer.

property parameters: dict[str, Union[bool, int, float, numpy.ndarray, str, NoneType]]

The parameters of the transformer.

transformers: Sequence[Transformer]

The sequence of transformers.

Scaling

Scaling a variable with a linear transformation.

The Scaler class implements the default scaling method applying to some parameter \(z\):

\[\bar{z} := \text{offset} + \text{coefficient}\times z\]

where \(\bar{z}\) is the scaled version of \(z\). This scaling method is a linear transformation parameterized by an offset and a coefficient.

In this default scaling method, the offset is equal to 0 and the coefficient is equal to 1. Consequently, the scaling operation is the identity: \(\bar{z}=z\). This method has to be overloaded.

class gemseo.mlearning.transform.scaler.scaler.Scaler(name='Scaler', offset=0.0, coefficient=1.0)[source]

Data scaler.

Parameters:
  • name (str) –

    A name for this transformer.

    By default it is set to “Scaler”.

  • offset (float | ndarray) –

    The offset of the linear transformation.

    By default it is set to 0.0.

  • coefficient (float | ndarray) –

    The coefficient of the linear transformation.

    By default it is set to 1.0.

compute_jacobian(data, *args, **kwargs)

Force a NumPy array to be 2D and evaluate the function f with it.

Parameters:
  • data (ndarray) – A 1D or 2D NumPy array.

  • *args (Any) – The description is missing.

  • **kwargs (Any) – The description is missing.

Returns:

Any kind of output; if a NumPy array, its dimension is made consistent with the shape of data.

Return type:

Any

compute_jacobian_inverse(data, *args, **kwargs)

Force a NumPy array to be 2D and evaluate the function f with it.

Parameters:
  • data (ndarray) – A 1D or 2D NumPy array.

  • *args (Any) – The description is missing.

  • **kwargs (Any) – The description is missing.

Returns:

Any kind of output; if a NumPy array, its dimension is made consistent with the shape of data.

Return type:

Any

duplicate()

Duplicate the current object.

Returns:

A deepcopy of the current instance.

Return type:

Transformer

fit(data, *args)

Fit the transformer to the data.

Parameters:
  • data (ndarray) – The data to be fitted, shaped as (n_observations, n_features) or (n_observations, ).

  • args (Union[float, int, str]) –

Return type:

None

fit_transform(data, *args)

Fit the transformer to the data and transform the data.

Parameters:
  • data (ndarray) – The data to be transformed, shaped as (n_observations, n_features) or (n_observations, ).

  • args (Union[float, int, str]) –

Returns:

The transformed data, shaped as data.

Return type:

ndarray

inverse_transform(data, *args, **kwargs)

Force a NumPy array to be 2D and evaluate the function f with it.

Parameters:
  • data (ndarray) – A 1D or 2D NumPy array.

  • *args (Any) – The description is missing.

  • **kwargs (Any) – The description is missing.

Returns:

Any kind of output; if a NumPy array, its dimension is made consistent with the shape of data.

Return type:

Any

transform(data, *args, **kwargs)

Force a NumPy array to be 2D and evaluate the function f with it.

Parameters:
  • data (ndarray) – A 1D or 2D NumPy array.

  • *args (Any) – The description is missing.

  • **kwargs (Any) – The description is missing.

Returns:

Any kind of output; if a NumPy array, its dimension is made consistent with the shape of data.

Return type:

Any

CROSSED: ClassVar[bool] = False

Whether the fit() method requires two data arrays.

property coefficient: ndarray

The scaling coefficient.

property is_fitted: bool

Whether the transformer has been fitted from some data.

name: str

The name of the transformer.

property offset: ndarray

The scaling offset.

property parameters: dict[str, Union[bool, int, float, numpy.ndarray, str, NoneType]]

The parameters of the transformer.

Scaling a variable with a geometrical linear transformation.

The MinMaxScaler class implements the MinMax scaling method applying to some parameter \(z\):

\[\bar{z} := \text{offset} + \text{coefficient}\times z = \frac{z-\text{min}(z)}{(\text{max}(z)-\text{min}(z))},\]

where \(\text{offset}=-\text{min}(z)/(\text{max}(z)-\text{min}(z))\) and \(\text{coefficient}=1/(\text{max}(z)-\text{min}(z))\).

In the MinMax scaling method, the scaling operation linearly transforms the original variable \(z\) such that the minimum of the original data corresponds to 0 and the maximum to 1.

Warning

When \(\text{min}(z)=\text{max}(z)\), we use \(\bar{z}=\frac{z}{\text{min}(z)}-0.5\).

class gemseo.mlearning.transform.scaler.min_max_scaler.MinMaxScaler(name='MinMaxScaler', offset=0.0, coefficient=1.0)[source]

Min-max scaler.

Parameters:
  • name (str) –

    A name for this transformer.

    By default it is set to “MinMaxScaler”.

  • offset (float) –

    The offset of the linear transformation.

    By default it is set to 0.0.

  • coefficient (float) –

    The coefficient of the linear transformation.

    By default it is set to 1.0.

compute_jacobian(data, *args, **kwargs)

Force a NumPy array to be 2D and evaluate the function f with it.

Parameters:
  • data (ndarray) – A 1D or 2D NumPy array.

  • *args (Any) – The description is missing.

  • **kwargs (Any) – The description is missing.

Returns:

Any kind of output; if a NumPy array, its dimension is made consistent with the shape of data.

Return type:

Any

compute_jacobian_inverse(data, *args, **kwargs)

Force a NumPy array to be 2D and evaluate the function f with it.

Parameters:
  • data (ndarray) – A 1D or 2D NumPy array.

  • *args (Any) – The description is missing.

  • **kwargs (Any) – The description is missing.

Returns:

Any kind of output; if a NumPy array, its dimension is made consistent with the shape of data.

Return type:

Any

duplicate()

Duplicate the current object.

Returns:

A deepcopy of the current instance.

Return type:

Transformer

fit(data, *args)

Fit the transformer to the data.

Parameters:
  • data (ndarray) – The data to be fitted, shaped as (n_observations, n_features) or (n_observations, ).

  • args (Union[float, int, str]) –

Return type:

None

fit_transform(data, *args)

Fit the transformer to the data and transform the data.

Parameters:
  • data (ndarray) – The data to be transformed, shaped as (n_observations, n_features) or (n_observations, ).

  • args (Union[float, int, str]) –

Returns:

The transformed data, shaped as data.

Return type:

ndarray

inverse_transform(data, *args, **kwargs)

Force a NumPy array to be 2D and evaluate the function f with it.

Parameters:
  • data (ndarray) – A 1D or 2D NumPy array.

  • *args (Any) – The description is missing.

  • **kwargs (Any) – The description is missing.

Returns:

Any kind of output; if a NumPy array, its dimension is made consistent with the shape of data.

Return type:

Any

transform(data, *args, **kwargs)

Force a NumPy array to be 2D and evaluate the function f with it.

Parameters:
  • data (ndarray) – A 1D or 2D NumPy array.

  • *args (Any) – The description is missing.

  • **kwargs (Any) – The description is missing.

Returns:

Any kind of output; if a NumPy array, its dimension is made consistent with the shape of data.

Return type:

Any

CROSSED: ClassVar[bool] = False

Whether the fit() method requires two data arrays.

property coefficient: ndarray

The scaling coefficient.

property is_fitted: bool

Whether the transformer has been fitted from some data.

name: str

The name of the transformer.

property offset: ndarray

The scaling offset.

property parameters: dict[str, Union[bool, int, float, numpy.ndarray, str, NoneType]]

The parameters of the transformer.

Scaling a variable with a statistical linear transformation.

The StandardScaler class implements the Standard scaling method applying to some parameter \(z\):

\[\bar{z} := \text{offset} + \text{coefficient}\times z = \frac{z-\text{mean}(z)}{\text{std}(z)}\]

where \(\text{offset}=-\text{mean}(z)/\text{std}(z)\) and \(\text{coefficient}=1/\text{std}(z)\).

In this standard scaling method, the scaling operation linearly transforms the original variable math:z such that in the scaled space, the original data have zero mean and unit standard deviation.

Warning

When \(\text{std}(z)=0\), we use \(\bar{z}=\frac{z}{\text{mean}(z)}-1\).

class gemseo.mlearning.transform.scaler.standard_scaler.StandardScaler(name='StandardScaler', offset=0.0, coefficient=1.0)[source]

Standard scaler.

Parameters:
  • name (str) –

    A name for this transformer.

    By default it is set to “StandardScaler”.

  • offset (float) –

    The offset of the linear transformation.

    By default it is set to 0.0.

  • coefficient (float) –

    The coefficient of the linear transformation.

    By default it is set to 1.0.

compute_jacobian(data, *args, **kwargs)

Force a NumPy array to be 2D and evaluate the function f with it.

Parameters:
  • data (ndarray) – A 1D or 2D NumPy array.

  • *args (Any) – The description is missing.

  • **kwargs (Any) – The description is missing.

Returns:

Any kind of output; if a NumPy array, its dimension is made consistent with the shape of data.

Return type:

Any

compute_jacobian_inverse(data, *args, **kwargs)

Force a NumPy array to be 2D and evaluate the function f with it.

Parameters:
  • data (ndarray) – A 1D or 2D NumPy array.

  • *args (Any) – The description is missing.

  • **kwargs (Any) – The description is missing.

Returns:

Any kind of output; if a NumPy array, its dimension is made consistent with the shape of data.

Return type:

Any

duplicate()

Duplicate the current object.

Returns:

A deepcopy of the current instance.

Return type:

Transformer

fit(data, *args)

Fit the transformer to the data.

Parameters:
  • data (ndarray) – The data to be fitted, shaped as (n_observations, n_features) or (n_observations, ).

  • args (Union[float, int, str]) –

Return type:

None

fit_transform(data, *args)

Fit the transformer to the data and transform the data.

Parameters:
  • data (ndarray) – The data to be transformed, shaped as (n_observations, n_features) or (n_observations, ).

  • args (Union[float, int, str]) –

Returns:

The transformed data, shaped as data.

Return type:

ndarray

inverse_transform(data, *args, **kwargs)

Force a NumPy array to be 2D and evaluate the function f with it.

Parameters:
  • data (ndarray) – A 1D or 2D NumPy array.

  • *args (Any) – The description is missing.

  • **kwargs (Any) – The description is missing.

Returns:

Any kind of output; if a NumPy array, its dimension is made consistent with the shape of data.

Return type:

Any

transform(data, *args, **kwargs)

Force a NumPy array to be 2D and evaluate the function f with it.

Parameters:
  • data (ndarray) – A 1D or 2D NumPy array.

  • *args (Any) – The description is missing.

  • **kwargs (Any) – The description is missing.

Returns:

Any kind of output; if a NumPy array, its dimension is made consistent with the shape of data.

Return type:

Any

CROSSED: ClassVar[bool] = False

Whether the fit() method requires two data arrays.

property coefficient: ndarray

The scaling coefficient.

property is_fitted: bool

Whether the transformer has been fitted from some data.

name: str

The name of the transformer.

property offset: ndarray

The scaling offset.

property parameters: dict[str, Union[bool, int, float, numpy.ndarray, str, NoneType]]

The parameters of the transformer.

Dimension reduction

Dimension reduction as a generic transformer.

The DimensionReduction class implements the concept of dimension reduction.

See also

pca

class gemseo.mlearning.transform.dimension_reduction.dimension_reduction.DimensionReduction(name='DimensionReduction', n_components=None, **parameters)[source]

Dimension reduction.

Parameters:
  • name (str) –

    A name for this transformer.

    By default it is set to “DimensionReduction”.

  • n_components (int | None) – The number of components of the latent space. If None, use the maximum number allowed by the technique, typically min(n_samples, n_features).

  • **parameters (bool | int | float | str | None) – The parameters of the transformer.

compute_jacobian(data)

Compute the Jacobian of transform().

Parameters:

data (ndarray) – The data where the Jacobian is to be computed, shaped as (n_observations, n_features) or (n_features, ).

Returns:

The Jacobian matrix, shaped according to data.

Return type:

NoReturn

compute_jacobian_inverse(data)

Compute the Jacobian of the inverse_transform().

Parameters:

data (ndarray) – The data where the Jacobian is to be computed, shaped as (n_observations, n_features) or (n_features, ).

Returns:

The Jacobian matrix, shaped according to data..

Return type:

NoReturn

duplicate()

Duplicate the current object.

Returns:

A deepcopy of the current instance.

Return type:

Transformer

fit(data, *args)

Fit the transformer to the data.

Parameters:
  • data (ndarray) – The data to be fitted, shaped as (n_observations, n_features) or (n_observations, ).

  • args (Union[float, int, str]) –

Return type:

None

fit_transform(data, *args)

Fit the transformer to the data and transform the data.

Parameters:
  • data (ndarray) – The data to be transformed, shaped as (n_observations, n_features) or (n_observations, ).

  • args (Union[float, int, str]) –

Returns:

The transformed data, shaped as data.

Return type:

ndarray

inverse_transform(data)

Perform an inverse transform on the data.

Parameters:

data (ndarray) – The data to be inverse transformed, shaped as (n_observations, n_features) or (n_features, ).

Returns:

The inverse transformed data, shaped as data.

Return type:

NoReturn

abstract transform(data)

Transform the data.

Parameters:

data (ndarray) – The data to be transformed, shaped as (n_observations, n_features) or (n_features, ).

Returns:

The transformed data, shaped as data.

Return type:

ndarray

CROSSED: ClassVar[bool] = False

Whether the fit() method requires two data arrays.

property is_fitted: bool

Whether the transformer has been fitted from some data.

property n_components: int

The number of components.

name: str

The name of the transformer.

property parameters: dict[str, Union[bool, int, float, numpy.ndarray, str, NoneType]]

The parameters of the transformer.

The Principal Component Analysis (PCA) to reduce the dimension of a variable.

The PCA class wraps the PCA from Scikit-learn.

Dependence

This dimension reduction algorithm relies on the PCA class of the scikit-learn library.

class gemseo.mlearning.transform.dimension_reduction.pca.PCA(name='PCA', n_components=None, **parameters)[source]

Principal component dimension reduction algorithm.

Parameters:
  • name (str) –

    A name for this transformer.

    By default it is set to “PCA”.

  • n_components (int | None) – The number of components of the latent space. If None, use the maximum number allowed by the technique, typically min(n_samples, n_features).

  • **parameters (float | int | str | bool | None) – The optional parameters for sklearn PCA constructor.

compute_jacobian(data, *args, **kwargs)

Force a NumPy array to be 2D and evaluate the function f with it.

Parameters:
  • data (ndarray) – A 1D or 2D NumPy array.

  • *args (Any) – The description is missing.

  • **kwargs (Any) – The description is missing.

Returns:

Any kind of output; if a NumPy array, its dimension is made consistent with the shape of data.

Return type:

Any

compute_jacobian_inverse(data, *args, **kwargs)

Force a NumPy array to be 2D and evaluate the function f with it.

Parameters:
  • data (ndarray) – A 1D or 2D NumPy array.

  • *args (Any) – The description is missing.

  • **kwargs (Any) – The description is missing.

Returns:

Any kind of output; if a NumPy array, its dimension is made consistent with the shape of data.

Return type:

Any

duplicate()

Duplicate the current object.

Returns:

A deepcopy of the current instance.

Return type:

Transformer

fit(data, *args)

Fit the transformer to the data.

Parameters:
  • data (ndarray) – The data to be fitted, shaped as (n_observations, n_features) or (n_observations, ).

  • args (Union[float, int, str]) –

Return type:

None

fit_transform(data, *args)

Fit the transformer to the data and transform the data.

Parameters:
  • data (ndarray) – The data to be transformed, shaped as (n_observations, n_features) or (n_observations, ).

  • args (Union[float, int, str]) –

Returns:

The transformed data, shaped as data.

Return type:

ndarray

inverse_transform(data, *args, **kwargs)

Force a NumPy array to be 2D and evaluate the function f with it.

Parameters:
  • data (ndarray) – A 1D or 2D NumPy array.

  • *args (Any) – The description is missing.

  • **kwargs (Any) – The description is missing.

Returns:

Any kind of output; if a NumPy array, its dimension is made consistent with the shape of data.

Return type:

Any

transform(data, *args, **kwargs)

Force a NumPy array to be 2D and evaluate the function f with it.

Parameters:
  • data (ndarray) – A 1D or 2D NumPy array.

  • *args (Any) – The description is missing.

  • **kwargs (Any) – The description is missing.

Returns:

Any kind of output; if a NumPy array, its dimension is made consistent with the shape of data.

Return type:

Any

CROSSED: ClassVar[bool] = False

Whether the fit() method requires two data arrays.

property components: ndarray

The principal components.

property is_fitted: bool

Whether the transformer has been fitted from some data.

property n_components: int

The number of components.

name: str

The name of the transformer.

property parameters: dict[str, Union[bool, int, float, numpy.ndarray, str, NoneType]]

The parameters of the transformer.

Examples

See the examples about: