Transform data to improve the ML algorithm quality#

Introduction#

A pipeline to chain transformers.

The Pipeline class chains a sequence of tranformers, and provides global fit(), transform(), fit_transform() and inverse_transform() methods.

class Pipeline(name='', transformers=())[source]

BaseTransformer pipeline.

Parameters:
  • name (str) --

    A name for this pipeline.

    By default it is set to "".

  • transformers (Sequence[BaseTransformer]) --

    A sequence of transformers to be chained. The transformers are chained in the order of appearance in the list, i.e. the first transformer is applied first. If transformers is an empty list or None, then the pipeline transformer behaves like an identity transformer.

    By default it is set to ().

compute_jacobian(data)[source]

Compute the Jacobian of the pipeline.transform().

Parameters:

data (ndarray) -- The data where the Jacobian is to be computed.

Returns:

The Jacobian matrix.

Return type:

ndarray

compute_jacobian_inverse(data)[source]

Compute the Jacobian of the pipeline.inverse_transform().

Parameters:

data (ndarray) -- The data where the Jacobian is to be computed.

Returns:

The Jacobian matrix.

Return type:

ndarray

duplicate()[source]

Duplicate the current object.

Returns:

A deepcopy of the current instance.

Return type:

Pipeline

inverse_transform(data)[source]

Perform an inverse transform on the data.

The data is inverse transformed sequentially, starting with the last transformer in the list.

Parameters:

data (ndarray) -- The data to be inverse transformed.

Returns:

The inverse transformed data.

Return type:

ndarray

transform(data)[source]

Transform the data.

The data is transformed sequentially, where the output of one transformer is the input of the next.

Parameters:

data (ndarray) -- The data to be transformed.

Returns:

The transformed data.

Return type:

ndarray

transformers: Sequence[BaseTransformer]

The sequence of transformers.

Scaling#

Scaling a variable with a linear transformation.

The Scaler class implements the default scaling method applying to some parameter \(z\):

\[\bar{z} := \text{offset} + \text{coefficient}\times z\]

where \(\bar{z}\) is the scaled version of \(z\). This scaling method is a linear transformation parameterized by an offset and a coefficient.

In this default scaling method, the offset is equal to 0 and the coefficient is equal to 1. Consequently, the scaling operation is the identity: \(\bar{z}=z\). This method has to be overloaded.

class Scaler(name='', offset=0.0, coefficient=1.0)[source]

Data scaler.

Parameters:
  • name (str) --

    A name for this transformer.

    By default it is set to "".

  • offset (float | RealArray) --

    The offset of the linear transformation.

    By default it is set to 0.0.

  • coefficient (float | RealArray) --

    The coefficient of the linear transformation.

    By default it is set to 1.0.

compute_jacobian(data, *args, **kwargs)

Force a NumPy array to be at least 2D and evaluate the function f.

f expects a 2D array shaped as (n_points, input_dimension) and returns a nD arrays shaped as (..., n_points, output_dimension) or (..., n_points, output_dimension, input_dimension).

If the original data is a 1D array shaped as (input_dimension,), then this wrapper returns a (n-1)D array shaped as (..., output_dimension) or (..., output_dimension, intput_dimension).

Parameters:
  • data (ndarray) -- A NumPy array.

  • *args (Any) -- The positional arguments.

  • **kwargs (Any) -- The optional arguments.

Returns:

Any kind of output; if a NumPy array, its dimension is made consistent with the shape of data.

Return type:

Any

compute_jacobian_inverse(data, *args, **kwargs)

Force a NumPy array to be at least 2D and evaluate the function f.

f expects a 2D array shaped as (n_points, input_dimension) and returns a nD arrays shaped as (..., n_points, output_dimension) or (..., n_points, output_dimension, input_dimension).

If the original data is a 1D array shaped as (input_dimension,), then this wrapper returns a (n-1)D array shaped as (..., output_dimension) or (..., output_dimension, intput_dimension).

Parameters:
  • data (ndarray) -- A NumPy array.

  • *args (Any) -- The positional arguments.

  • **kwargs (Any) -- The optional arguments.

Returns:

Any kind of output; if a NumPy array, its dimension is made consistent with the shape of data.

Return type:

Any

inverse_transform(data, *args, **kwargs)

Force a NumPy array to be at least 2D and evaluate the function f.

f expects a 2D array shaped as (n_points, input_dimension) and returns a nD arrays shaped as (..., n_points, output_dimension) or (..., n_points, output_dimension, input_dimension).

If the original data is a 1D array shaped as (input_dimension,), then this wrapper returns a (n-1)D array shaped as (..., output_dimension) or (..., output_dimension, intput_dimension).

Parameters:
  • data (ndarray) -- A NumPy array.

  • *args (Any) -- The positional arguments.

  • **kwargs (Any) -- The optional arguments.

Returns:

Any kind of output; if a NumPy array, its dimension is made consistent with the shape of data.

Return type:

Any

transform(data, *args, **kwargs)

Force a NumPy array to be at least 2D and evaluate the function f.

f expects a 2D array shaped as (n_points, input_dimension) and returns a nD arrays shaped as (..., n_points, output_dimension) or (..., n_points, output_dimension, input_dimension).

If the original data is a 1D array shaped as (input_dimension,), then this wrapper returns a (n-1)D array shaped as (..., output_dimension) or (..., output_dimension, intput_dimension).

Parameters:
  • data (ndarray) -- A NumPy array.

  • *args (Any) -- The positional arguments.

  • **kwargs (Any) -- The optional arguments.

Returns:

Any kind of output; if a NumPy array, its dimension is made consistent with the shape of data.

Return type:

Any

property coefficient: RealArray

The scaling coefficient.

property offset: RealArray

The scaling offset.

Scaling a variable with a geometrical linear transformation.

The MinMaxScaler class implements the MinMax scaling method applying to some parameter \(z\):

\[\bar{z} := \text{offset} + \text{coefficient}\times z = \frac{z-\text{min}(z)}{(\text{max}(z)-\text{min}(z))},\]

where \(\text{offset}=-\text{min}(z)/(\text{max}(z)-\text{min}(z))\) and \(\text{coefficient}=1/(\text{max}(z)-\text{min}(z))\).

In the MinMax scaling method, the scaling operation linearly transforms the original variable \(z\) such that the minimum of the original data corresponds to 0 and the maximum to 1.

Warning

When \(\text{min}(z)=\text{max}(z)\neq 0\), we use \(\bar{z}=\frac{z}{\text{min}(z)}-0.5\). When \(\text{min}(z)=\text{max}(z)=0\), we use \(\bar{z}=z+0.5\).

class MinMaxScaler(name='', offset=0.0, coefficient=1.0)[source]

Min-max scaler.

Parameters:
  • name (str) --

    A name for this transformer.

    By default it is set to "".

  • offset (float) --

    The offset of the linear transformation.

    By default it is set to 0.0.

  • coefficient (float) --

    The coefficient of the linear transformation.

    By default it is set to 1.0.

Scaling a variable with a statistical linear transformation.

The StandardScaler class implements the Standard scaling method applying to some parameter \(z\):

\[\bar{z} := \text{offset} + \text{coefficient}\times z = \frac{z-\text{mean}(z)}{\text{std}(z)}\]

where \(\text{offset}=-\text{mean}(z)/\text{std}(z)\) and \(\text{coefficient}=1/\text{std}(z)\).

In this standard scaling method, the scaling operation linearly transforms the original variable math:z such that in the scaled space, the original data have zero mean and unit standard deviation.

Warning

When \(\text{std}(z)=0\) and \(\text{mean}(z)\neq 0\), we use \(\bar{z}=\frac{z}{\text{mean}(z)}-1\). When \(\text{std}(z)=0\) and \(\text{mean}(z)=0\), we use \(\bar{z}=z\).

class StandardScaler(name='', offset=0.0, coefficient=1.0)[source]

Standard scaler.

Parameters:
  • name (str) --

    A name for this transformer.

    By default it is set to "".

  • offset (float) --

    The offset of the linear transformation.

    By default it is set to 0.0.

  • coefficient (float) --

    The coefficient of the linear transformation.

    By default it is set to 1.0.

Dimension reduction#

Dimension reduction as a generic transformer.

The BaseDimensionReduction class implements the concept of dimension reduction.

See also

pca

class BaseDimensionReduction(name='', n_components=None, **parameters)[source]

Dimension reduction.

Parameters:
  • name (str) --

    A name for this transformer.

    By default it is set to "".

  • n_components (int | None) -- The number of components of the latent space. If None, use the maximum number allowed by the technique, typically min(n_samples, n_features).

  • **parameters (bool | float | str | None) -- The parameters of the transformer.

property n_components: int

The number of components.

The Principal Component Analysis (PCA) to reduce the dimension of a variable.

The PCA class wraps the PCA from Scikit-learn.

Dependence#

This dimension reduction algorithm relies on the PCA class of the scikit-learn library.

class PCA(name='', n_components=None, scale=False, **parameters)[source]

Principal component dimension reduction algorithm.

Parameters:
  • name (str) --

    A name for this transformer.

    By default it is set to "".

  • n_components (float | Literal['mle'] | None) -- Either the number of components (a positive integer), the minimum amount of variance to be explained by the components (a float in \(]0,1[\)), the constant "mle" to guess this number or None to define it as min(n_samples, n_features).

  • scale (bool) --

    Whether to scale the data before applying the PCA.

    By default it is set to False.

  • **parameters (float | str | bool | None) -- The optional parameters for sklearn PCA constructor.

compute_jacobian(data, *args, **kwargs)

Force a NumPy array to be at least 2D and evaluate the function f.

f expects a 2D array shaped as (n_points, input_dimension) and returns a nD arrays shaped as (..., n_points, output_dimension) or (..., n_points, output_dimension, input_dimension).

If the original data is a 1D array shaped as (input_dimension,), then this wrapper returns a (n-1)D array shaped as (..., output_dimension) or (..., output_dimension, intput_dimension).

Parameters:
  • data (ndarray) -- A NumPy array.

  • *args (Any) -- The positional arguments.

  • **kwargs (Any) -- The optional arguments.

Returns:

Any kind of output; if a NumPy array, its dimension is made consistent with the shape of data.

Return type:

Any

compute_jacobian_inverse(data, *args, **kwargs)

Force a NumPy array to be at least 2D and evaluate the function f.

f expects a 2D array shaped as (n_points, input_dimension) and returns a nD arrays shaped as (..., n_points, output_dimension) or (..., n_points, output_dimension, input_dimension).

If the original data is a 1D array shaped as (input_dimension,), then this wrapper returns a (n-1)D array shaped as (..., output_dimension) or (..., output_dimension, intput_dimension).

Parameters:
  • data (ndarray) -- A NumPy array.

  • *args (Any) -- The positional arguments.

  • **kwargs (Any) -- The optional arguments.

Returns:

Any kind of output; if a NumPy array, its dimension is made consistent with the shape of data.

Return type:

Any

inverse_transform(data, *args, **kwargs)

Force a NumPy array to be at least 2D and evaluate the function f.

f expects a 2D array shaped as (n_points, input_dimension) and returns a nD arrays shaped as (..., n_points, output_dimension) or (..., n_points, output_dimension, input_dimension).

If the original data is a 1D array shaped as (input_dimension,), then this wrapper returns a (n-1)D array shaped as (..., output_dimension) or (..., output_dimension, intput_dimension).

Parameters:
  • data (ndarray) -- A NumPy array.

  • *args (Any) -- The positional arguments.

  • **kwargs (Any) -- The optional arguments.

Returns:

Any kind of output; if a NumPy array, its dimension is made consistent with the shape of data.

Return type:

Any

transform(data, *args, **kwargs)

Force a NumPy array to be at least 2D and evaluate the function f.

f expects a 2D array shaped as (n_points, input_dimension) and returns a nD arrays shaped as (..., n_points, output_dimension) or (..., n_points, output_dimension, input_dimension).

If the original data is a 1D array shaped as (input_dimension,), then this wrapper returns a (n-1)D array shaped as (..., output_dimension) or (..., output_dimension, intput_dimension).

Parameters:
  • data (ndarray) -- A NumPy array.

  • *args (Any) -- The positional arguments.

  • **kwargs (Any) -- The optional arguments.

Returns:

Any kind of output; if a NumPy array, its dimension is made consistent with the shape of data.

Return type:

Any

property components: RealArray

The principal components.

property data_is_scaled: bool

Whether the transformer scales the data before reducing its dimension.

Examples#

See the examples about: