Transform data to improve the ML algorithm quality

Data transformer

The abstract Transformer class implements the concept of a data transformer. Inheriting classes should implement the Transformer.fit(), Transformer.transform() and possibly Transformer.inverse_transform() methods.

Data transformer pipeline

The Pipeline class chains a sequence of tranformers, and provides global fit(), transform(), fit_transform() and inverse_transform() methods.

Data scaler

The Scaler class implements the default scaling method applying to some parameter \(z\):

\[\bar{z} := \text{offset} + \text{coefficient}\times z\]

where \(\bar{z}\) is the scaled version of z. This scaling method is a linear transformation parameterized by an offset and a coefficient.

In this default scaling method, the offset is equal to 0 and the coefficient is equal to 1. Consequently, the scaling operation is the identity: \(\bar{z}=z\). This method has to be overloaded.

Examples

Min-max data scaler

The MinMaxScaler class implements the MinMax scaling method applying to some parameter \(z\):

\[\bar{z} := \text{offset} + \text{coefficient}\times z = \frac{z-\text{min}(z)}{(\text{max}(z)-\text{min}(z))},\]

where \(\text{offset}=-\text{min}(z)/(\text{max}(z)-\text{min}(z))\) and \(\text{coefficient}=1/(\text{max}(z)-\text{min}(z))\).

In the MinMax scaling method, the scaling operation linearly transforms the original variable \(z\) such that the minimum of the original data corresponds to 0 and the maximum to 1.

Standard data scaler

The StandardScaler class implements the Standard scaling method applying to some parameter \(z\):

\[\bar{z} := \text{offset} + \text{coefficient}\times z = \frac{z-\text{mean}(z)}{\text{std}(z)}\]

where \(\text{offset}=-\text{mean}(z)/\text{std}(z)\) and \(\text{coefficient}=1/\text{std}(z)\).

In this Standard scaling method, the scaling operation linearly transforms the original variable math:z such that in the scaled space, the original data have zero mean and unit standard deviation.

Dimension reduction

The DimensionReduction class implements the concept of dimension reduction.

See also

pca

Examples

Principal component dimension reduction algorithm

The PCA class wraps the PCA from Scikit-learn.

Dependence

This dimension reduction algorithm relies on the PCA class of the scikit-learn library.

Development