Transform data to improve the ML algorithm quality

A transformer to apply operations on NumPy arrays.

The abstract Transformer class implements the concept of a data transformer. Inheriting classes shall implement the Transformer.fit(), Transformer.transform() and possibly Transformer.inverse_transform() methods.

Classes:

Transformer([name])

Transformer baseclass.

TransformerFactory(*args, **kwargs)

A factory of Transformer.

class gemseo.mlearning.transform.transformer.Transformer(name='Transformer', **parameters)[source]

Transformer baseclass.

name

The name of the transformer.

Type

str

parameters

The parameters of the transformer.

Type

str

Parameters
  • name (str) –

    A name for this transformer.

    By default it is set to Transformer.

  • **parameters (bool | int | float | ndarray | str | None) – The parameters of the transformer.

Return type

None

Methods:

compute_jacobian(data)

Compute Jacobian of transformer.transform().

compute_jacobian_inverse(data)

Compute Jacobian of the transformer.inverse_transform().

duplicate()

Duplicate the current object.

fit(data, *args)

Fit the transformer to the data.

fit_transform(data, *args)

Fit the transformer to the data and transform the data.

inverse_transform(data)

Perform an inverse transform on the data.

transform(data)

Transform the data.

compute_jacobian(data)[source]

Compute Jacobian of transformer.transform().

Parameters

data (numpy.ndarray) – The data where the Jacobian is to be computed.

Returns

The Jacobian matrix.

Return type

NoReturn

compute_jacobian_inverse(data)[source]

Compute Jacobian of the transformer.inverse_transform().

Parameters

data (numpy.ndarray) – The data where the Jacobian is to be computed.

Returns

The Jacobian matrix.

Return type

NoReturn

duplicate()[source]

Duplicate the current object.

Returns

A deepcopy of the current instance.

Return type

gemseo.mlearning.transform.transformer.Transformer

fit(data, *args)[source]

Fit the transformer to the data.

Parameters
Return type

NoReturn

fit_transform(data, *args)[source]

Fit the transformer to the data and transform the data.

Parameters
Returns

The transformed data.

Return type

numpy.ndarray

inverse_transform(data)[source]

Perform an inverse transform on the data.

Parameters

data (numpy.ndarray) – The data to be inverse transformed.

Returns

The inverse transformed data.

Return type

NoReturn

transform(data)[source]

Transform the data.

Parameters

data (numpy.ndarray) – The data to be transformed.

Returns

The transformed data.

Return type

NoReturn

class gemseo.mlearning.transform.transformer.TransformerFactory(*args, **kwargs)[source]

A factory of Transformer.

Parameters
  • base_class – The base class to be considered.

  • module_names – The fully qualified modules names to be searched.

Return type

None

Attributes:

classes

Return the available classes.

Methods:

create(class_name, **options)

Return an instance of a class.

get_class(name)

Return a class from its name.

get_default_options_values(name)

Return the constructor kwargs default values of a class.

get_default_sub_options_values(name, **options)

Return the default values of the sub options of a class.

get_options_doc(name)

Return the constructor documentation of a class.

get_options_grammar(name[, write_schema, ...])

Return the options JSON grammar for a class.

get_sub_options_grammar(name, **options)

Return the JSONGrammar of the sub options of a class.

is_available(name)

Return whether a class can be instantiated.

update()

Search for the classes that can be instantiated.

property classes: list[str]

Return the available classes.

Returns

The sorted names of the available classes.

create(class_name, **options)

Return an instance of a class.

Parameters
  • class_name (str) – The name of the class.

  • **options (Any) – The arguments to be passed to the class constructor.

Returns

The instance of the class.

Raises

TypeError – If the class cannot be instantiated.

Return type

Any

get_class(name)

Return a class from its name.

Parameters

name (str) – The name of the class.

Returns

The class.

Raises

ImportError – If the class is not available.

Return type

type[Any]

get_default_options_values(name)

Return the constructor kwargs default values of a class.

Parameters

name (str) – The name of the class.

Returns

The mapping from the argument names to their default values.

Return type

dict[str, str | int | float | bool]

get_default_sub_options_values(name, **options)

Return the default values of the sub options of a class.

Parameters
  • name (str) – The name of the class.

  • **options (str) – The options to be passed to the class required to deduce the sub options.

Returns

The JSON grammar.

Return type

gemseo.core.grammars.json_grammar.JSONGrammar

get_options_doc(name)

Return the constructor documentation of a class.

Parameters

name (str) – The name of the class.

Returns

The mapping from the argument names to their documentation.

Return type

dict[str, str]

get_options_grammar(name, write_schema=False, schema_path=None)

Return the options JSON grammar for a class.

Attempt to generate a JSONGrammar from the arguments of the __init__ method of the class.

Parameters
  • name (str) – The name of the class.

  • write_schema (bool) –

    If True, write the JSON schema to a file.

    By default it is set to False.

  • schema_path (str | None) –

    The path to the JSON schema file. If None, the file is saved in the current directory in a file named after the name of the class.

    By default it is set to None.

Returns

The JSON grammar.

Return type

JSONGrammar

get_sub_options_grammar(name, **options)

Return the JSONGrammar of the sub options of a class.

Parameters
  • name (str) – The name of the class.

  • **options (str) – The options to be passed to the class required to deduce the sub options.

Returns

The JSON grammar.

Return type

gemseo.core.grammars.json_grammar.JSONGrammar

is_available(name)

Return whether a class can be instantiated.

Parameters

name (str) – The name of the class.

Returns

Whether the class can be instantiated.

Return type

bool

update()

Search for the classes that can be instantiated.

The search is done in the following order:
  1. The fully qualified module names

  2. The plugin packages

  3. The packages from the environment variables

Return type

None

A pipeline to chain transformers.

The Pipeline class chains a sequence of tranformers, and provides global fit(), transform(), fit_transform() and inverse_transform() methods.

Classes:

Pipeline([name, transformers])

Transformer pipeline.

class gemseo.mlearning.transform.pipeline.Pipeline(name='Pipeline', transformers=None)[source]

Transformer pipeline.

name

The name of the transformer.

Type

str

parameters

The parameters of the transformer.

Type

str

transformers

The sequence of transformers.

Type

Sequence(Transformer)

Parameters
  • name (str) –

    A name for this pipeline.

    By default it is set to Pipeline.

  • transformers (Sequence[Transformer] | None) –

    A sequence of transformers to be chained. The transformers are chained in the order of appearance in the list, i.e. the first transformer is applied first. If transformers is an empty list or None, then the pipeline transformer behaves like an identity transformer.

    By default it is set to None.

Return type

None

Methods:

compute_jacobian(data)

Compute the Jacobian of the pipeline.transform().

compute_jacobian_inverse(data)

Compute the Jacobian of the pipeline.inverse_transform().

duplicate()

Duplicate the current object.

fit(data, *args)

Fit the transformer to the data.

fit_transform(data, *args)

Fit the transformer to the data and transform the data.

inverse_transform(data)

Perform an inverse transform on the data.

transform(data)

Transform the data.

compute_jacobian(data)[source]

Compute the Jacobian of the pipeline.transform().

Parameters

data (numpy.ndarray) – The data where the Jacobian is to be computed.

Returns

The Jacobian matrix.

Return type

numpy.ndarray

compute_jacobian_inverse(data)[source]

Compute the Jacobian of the pipeline.inverse_transform().

Parameters

data (numpy.ndarray) – The data where the Jacobian is to be computed.

Returns

The Jacobian matrix.

Return type

numpy.ndarray

duplicate()[source]

Duplicate the current object.

Returns

A deepcopy of the current instance.

Return type

gemseo.mlearning.transform.pipeline.Pipeline

fit(data, *args)

Fit the transformer to the data.

Parameters
Return type

NoReturn

fit_transform(data, *args)

Fit the transformer to the data and transform the data.

Parameters
Returns

The transformed data.

Return type

numpy.ndarray

inverse_transform(data)[source]

Perform an inverse transform on the data.

The data is inverse transformed sequentially, starting with the last transformer in the list.

Parameters

data (numpy.ndarray) – The data to be inverse transformed.

Returns

The inverse transformed data.

Return type

numpy.ndarray

transform(data)[source]

Transform the data.

The data is transformed sequentially, where the output of one transformer is the input of the next.

Parameters

data (numpy.ndarray) – The data to be transformed.

Returns

The transformed data.

Return type

numpy.ndarray

Scaling a variable with a linear transformation.

The Scaler class implements the default scaling method applying to some parameter \(z\):

\[\bar{z} := \text{offset} + \text{coefficient}\times z\]

where \(\bar{z}\) is the scaled version of \(z\). This scaling method is a linear transformation parameterized by an offset and a coefficient.

In this default scaling method, the offset is equal to 0 and the coefficient is equal to 1. Consequently, the scaling operation is the identity: \(\bar{z}=z\). This method has to be overloaded.

Classes:

Scaler([name, offset, coefficient])

Data scaler.

class gemseo.mlearning.transform.scaler.scaler.Scaler(name='Scaler', offset=0.0, coefficient=1.0)[source]

Data scaler.

name

The name of the transformer.

Type

str

parameters

The parameters of the transformer.

Type

str

Parameters
  • name (str) –

    A name for this transformer.

    By default it is set to Scaler.

  • offset (float) –

    The offset of the linear transformation.

    By default it is set to 0.0.

  • coefficient (float) –

    The coefficient of the linear transformation.

    By default it is set to 1.0.

Return type

None

Attributes:

coefficient

The scaling coefficient.

offset

The scaling offset.

Methods:

compute_jacobian(data)

Compute Jacobian of transformer.transform().

compute_jacobian_inverse(data)

Compute Jacobian of the transformer.inverse_transform().

duplicate()

Duplicate the current object.

fit(data, *args)

Fit the transformer to the data.

fit_transform(data, *args)

Fit the transformer to the data and transform the data.

inverse_transform(data)

Perform an inverse transform on the data.

transform(data)

Transform the data.

property coefficient: float

The scaling coefficient.

compute_jacobian(data)[source]

Compute Jacobian of transformer.transform().

Parameters

data (numpy.ndarray) – The data where the Jacobian is to be computed.

Returns

The Jacobian matrix.

Return type

numpy.ndarray

compute_jacobian_inverse(data)[source]

Compute Jacobian of the transformer.inverse_transform().

Parameters

data (numpy.ndarray) – The data where the Jacobian is to be computed.

Returns

The Jacobian matrix.

Return type

numpy.ndarray

duplicate()

Duplicate the current object.

Returns

A deepcopy of the current instance.

Return type

gemseo.mlearning.transform.transformer.Transformer

fit(data, *args)

Fit the transformer to the data.

Parameters
Return type

NoReturn

fit_transform(data, *args)

Fit the transformer to the data and transform the data.

Parameters
Returns

The transformed data.

Return type

numpy.ndarray

inverse_transform(data)[source]

Perform an inverse transform on the data.

Parameters

data (numpy.ndarray) – The data to be inverse transformed.

Returns

The inverse transformed data.

Return type

numpy.ndarray

property offset: float

The scaling offset.

transform(data)[source]

Transform the data.

Parameters

data (numpy.ndarray) – The data to be transformed.

Returns

The transformed data.

Return type

numpy.ndarray

Examples

Scaling a variable with a geometrical linear transformation.

The MinMaxScaler class implements the MinMax scaling method applying to some parameter \(z\):

\[\bar{z} := \text{offset} + \text{coefficient}\times z = \frac{z-\text{min}(z)}{(\text{max}(z)-\text{min}(z))},\]

where \(\text{offset}=-\text{min}(z)/(\text{max}(z)-\text{min}(z))\) and \(\text{coefficient}=1/(\text{max}(z)-\text{min}(z))\).

In the MinMax scaling method, the scaling operation linearly transforms the original variable \(z\) such that the minimum of the original data corresponds to 0 and the maximum to 1.

Classes:

MinMaxScaler([name, offset, coefficient])

Min-max scaler.

class gemseo.mlearning.transform.scaler.min_max_scaler.MinMaxScaler(name='MinMaxScaler', offset=0.0, coefficient=1.0)[source]

Min-max scaler.

name

The name of the transformer.

Type

str

parameters

The parameters of the transformer.

Type

str

Parameters
  • name (str) –

    A name for this transformer.

    By default it is set to MinMaxScaler.

  • offset (float) –

    The offset of the linear transformation.

    By default it is set to 0.0.

  • coefficient (float) –

    The coefficient of the linear transformation.

    By default it is set to 1.0.

Return type

None

Attributes:

coefficient

The scaling coefficient.

offset

The scaling offset.

Methods:

compute_jacobian(data)

Compute Jacobian of transformer.transform().

compute_jacobian_inverse(data)

Compute Jacobian of the transformer.inverse_transform().

duplicate()

Duplicate the current object.

fit(data, *args)

Fit the transformer to the data.

fit_transform(data, *args)

Fit the transformer to the data and transform the data.

inverse_transform(data)

Perform an inverse transform on the data.

transform(data)

Transform the data.

property coefficient: float

The scaling coefficient.

compute_jacobian(data)

Compute Jacobian of transformer.transform().

Parameters

data (numpy.ndarray) – The data where the Jacobian is to be computed.

Returns

The Jacobian matrix.

Return type

numpy.ndarray

compute_jacobian_inverse(data)

Compute Jacobian of the transformer.inverse_transform().

Parameters

data (numpy.ndarray) – The data where the Jacobian is to be computed.

Returns

The Jacobian matrix.

Return type

numpy.ndarray

duplicate()

Duplicate the current object.

Returns

A deepcopy of the current instance.

Return type

gemseo.mlearning.transform.transformer.Transformer

fit(data, *args)

Fit the transformer to the data.

Parameters
Return type

NoReturn

fit_transform(data, *args)

Fit the transformer to the data and transform the data.

Parameters
Returns

The transformed data.

Return type

numpy.ndarray

inverse_transform(data)

Perform an inverse transform on the data.

Parameters

data (numpy.ndarray) – The data to be inverse transformed.

Returns

The inverse transformed data.

Return type

numpy.ndarray

property offset: float

The scaling offset.

transform(data)

Transform the data.

Parameters

data (numpy.ndarray) – The data to be transformed.

Returns

The transformed data.

Return type

numpy.ndarray

Scaling a variable with a statistical linear transformation.

The StandardScaler class implements the Standard scaling method applying to some parameter \(z\):

\[\bar{z} := \text{offset} + \text{coefficient}\times z = \frac{z-\text{mean}(z)}{\text{std}(z)}\]

where \(\text{offset}=-\text{mean}(z)/\text{std}(z)\) and \(\text{coefficient}=1/\text{std}(z)\).

In this standard scaling method, the scaling operation linearly transforms the original variable math:z such that in the scaled space, the original data have zero mean and unit standard deviation.

Classes:

StandardScaler([name, offset, coefficient])

Standard scaler.

class gemseo.mlearning.transform.scaler.standard_scaler.StandardScaler(name='StandardScaler', offset=0.0, coefficient=1.0)[source]

Standard scaler.

name

The name of the transformer.

Type

str

parameters

The parameters of the transformer.

Type

str

Parameters
  • name (str) –

    A name for this transformer.

    By default it is set to StandardScaler.

  • offset (float) –

    The offset of the linear transformation.

    By default it is set to 0.0.

  • coefficient (float) –

    The coefficient of the linear transformation.

    By default it is set to 1.0.

Return type

None

Attributes:

coefficient

The scaling coefficient.

offset

The scaling offset.

Methods:

compute_jacobian(data)

Compute Jacobian of transformer.transform().

compute_jacobian_inverse(data)

Compute Jacobian of the transformer.inverse_transform().

duplicate()

Duplicate the current object.

fit(data, *args)

Fit the transformer to the data.

fit_transform(data, *args)

Fit the transformer to the data and transform the data.

inverse_transform(data)

Perform an inverse transform on the data.

transform(data)

Transform the data.

property coefficient: float

The scaling coefficient.

compute_jacobian(data)

Compute Jacobian of transformer.transform().

Parameters

data (numpy.ndarray) – The data where the Jacobian is to be computed.

Returns

The Jacobian matrix.

Return type

numpy.ndarray

compute_jacobian_inverse(data)

Compute Jacobian of the transformer.inverse_transform().

Parameters

data (numpy.ndarray) – The data where the Jacobian is to be computed.

Returns

The Jacobian matrix.

Return type

numpy.ndarray

duplicate()

Duplicate the current object.

Returns

A deepcopy of the current instance.

Return type

gemseo.mlearning.transform.transformer.Transformer

fit(data, *args)

Fit the transformer to the data.

Parameters
Return type

NoReturn

fit_transform(data, *args)

Fit the transformer to the data and transform the data.

Parameters
Returns

The transformed data.

Return type

numpy.ndarray

inverse_transform(data)

Perform an inverse transform on the data.

Parameters

data (numpy.ndarray) – The data to be inverse transformed.

Returns

The inverse transformed data.

Return type

numpy.ndarray

property offset: float

The scaling offset.

transform(data)

Transform the data.

Parameters

data (numpy.ndarray) – The data to be transformed.

Returns

The transformed data.

Return type

numpy.ndarray

Dimension reduction as a generic transformer.

The DimensionReduction class implements the concept of dimension reduction.

See also

pca

Classes:

DimensionReduction([name, n_components])

Dimension reduction.

class gemseo.mlearning.transform.dimension_reduction.dimension_reduction.DimensionReduction(name='DimensionReduction', n_components=None, **parameters)[source]

Dimension reduction.

name

The name of the transformer.

Type

str

parameters

The parameters of the transformer.

Type

str

Parameters
  • name (str) –

    A name for this transformer.

    By default it is set to DimensionReduction.

  • n_components (int | None) –

    The number of components of the latent space. If None, use the maximum number allowed by the technique, typically min(n_samples, n_features).

    By default it is set to None.

  • **parameters (bool | int | float | ndarray | str | None) – The parameters of the transformer.

Return type

None

Methods:

compute_jacobian(data)

Compute Jacobian of transformer.transform().

compute_jacobian_inverse(data)

Compute Jacobian of the transformer.inverse_transform().

duplicate()

Duplicate the current object.

fit(data, *args)

Fit the transformer to the data.

fit_transform(data, *args)

Fit the transformer to the data and transform the data.

inverse_transform(data)

Perform an inverse transform on the data.

transform(data)

Transform the data.

Attributes:

n_components

The number of components.

compute_jacobian(data)

Compute Jacobian of transformer.transform().

Parameters

data (numpy.ndarray) – The data where the Jacobian is to be computed.

Returns

The Jacobian matrix.

Return type

NoReturn

compute_jacobian_inverse(data)

Compute Jacobian of the transformer.inverse_transform().

Parameters

data (numpy.ndarray) – The data where the Jacobian is to be computed.

Returns

The Jacobian matrix.

Return type

NoReturn

duplicate()

Duplicate the current object.

Returns

A deepcopy of the current instance.

Return type

gemseo.mlearning.transform.transformer.Transformer

fit(data, *args)

Fit the transformer to the data.

Parameters
Return type

NoReturn

fit_transform(data, *args)

Fit the transformer to the data and transform the data.

Parameters
Returns

The transformed data.

Return type

numpy.ndarray

inverse_transform(data)

Perform an inverse transform on the data.

Parameters

data (numpy.ndarray) – The data to be inverse transformed.

Returns

The inverse transformed data.

Return type

NoReturn

property n_components: int

The number of components.

transform(data)

Transform the data.

Parameters

data (numpy.ndarray) – The data to be transformed.

Returns

The transformed data.

Return type

NoReturn

Examples

The Principal Component Analysis (PCA) to reduce the dimension of a variable.

The PCA class wraps the PCA from Scikit-learn.

Dependence

This dimension reduction algorithm relies on the PCA class of the scikit-learn library.

Classes:

PCA([name, n_components])

Principal component dimension reduction algorithm.

class gemseo.mlearning.transform.dimension_reduction.pca.PCA(name='PCA', n_components=None, **parameters)[source]

Principal component dimension reduction algorithm.

name

The name of the transformer.

Type

str

parameters

The parameters of the transformer.

Type

str

Parameters
  • name (str) –

    A name for this transformer.

    By default it is set to PCA.

  • n_components (int | None) –

    The number of components of the latent space. If None, use the maximum number allowed by the technique, typically min(n_samples, n_features).

    By default it is set to None.

  • **parameters (float | int | str | bool | None) – The optional parameters for sklearn PCA constructor.

Return type

None

Attributes:

components

The principal components.

n_components

The number of components.

Methods:

compute_jacobian(data)

Compute Jacobian of transformer.transform().

compute_jacobian_inverse(data)

Compute Jacobian of the transformer.inverse_transform().

duplicate()

Duplicate the current object.

fit(data, *args)

Fit the transformer to the data.

fit_transform(data, *args)

Fit the transformer to the data and transform the data.

inverse_transform(data)

Perform an inverse transform on the data.

transform(data)

Transform the data.

property components: numpy.ndarray

The principal components.

compute_jacobian(data)[source]

Compute Jacobian of transformer.transform().

Parameters

data (numpy.ndarray) – The data where the Jacobian is to be computed.

Returns

The Jacobian matrix.

Return type

numpy.ndarray

compute_jacobian_inverse(data)[source]

Compute Jacobian of the transformer.inverse_transform().

Parameters

data (numpy.ndarray) – The data where the Jacobian is to be computed.

Returns

The Jacobian matrix.

Return type

numpy.ndarray

duplicate()

Duplicate the current object.

Returns

A deepcopy of the current instance.

Return type

gemseo.mlearning.transform.transformer.Transformer

fit(data, *args)

Fit the transformer to the data.

Parameters
Return type

NoReturn

fit_transform(data, *args)

Fit the transformer to the data and transform the data.

Parameters
Returns

The transformed data.

Return type

numpy.ndarray

inverse_transform(data)[source]

Perform an inverse transform on the data.

Parameters

data (numpy.ndarray) – The data to be inverse transformed.

Returns

The inverse transformed data.

Return type

numpy.ndarray

property n_components: int

The number of components.

transform(data)[source]

Transform the data.

Parameters

data (numpy.ndarray) – The data to be transformed.

Returns

The transformed data.

Return type

numpy.ndarray

Development