Transform data to improve the ML algorithm quality¶

Data transformer¶

The abstract Transformer class implements the concept of a data transformer. Inheriting classes should implement the Transformer.fit(), Transformer.transform() and possibly Transformer.inverse_transform() methods.

Data transformer pipeline¶

The Pipeline class chains a sequence of tranformers, and provides global fit(), transform(), fit_transform() and inverse_transform() methods.

Data scaler¶

The Scaler class implements the default scaling method applying to some parameter \(z\):

\[\bar{z} := \text{offset} + \text{coefficient}\times z\]

where \(\bar{z}\) is the scaled version of z. This scaling method is a linear transformation parameterized by an offset and a coefficient.

In this default scaling method, the offset is equal to 0 and the coefficient is equal to 1. Consequently, the scaling operation is the identity: \(\bar{z}=z\). This method has to be overloaded.

Min-max data scaler¶

The MinMaxScaler class implements the MinMax scaling method applying to some parameter \(z\):

\[\bar{z} := \text{offset} + \text{coefficient}\times z = \frac{z-\text{min}(z)}{(\text{max}(z)-\text{min}(z))},\]

where \(\text{offset}=-\text{min}(z)/(\text{max}(z)-\text{min}(z))\) and \(\text{coefficient}=1/(\text{max}(z)-\text{min}(z))\).

In the MinMax scaling method, the scaling operation linearly transforms the original variable \(z\) such that the minimum of the original data corresponds to 0 and the maximum to 1.

Standard data scaler¶

The StandardScaler class implements the Standard scaling method applying to some parameter \(z\):

\[\bar{z} := \text{offset} + \text{coefficient}\times z = \frac{z-\text{mean}(z)}{\text{std}(z)}\]

where \(\text{offset}=-\text{mean}(z)/\text{std}(z)\) and \(\text{coefficient}=1/\text{std}(z)\).

In this Standard scaling method, the scaling operation linearly transforms the original variable math:z such that in the scaled space, the original data have zero mean and unit standard deviation.

Dimension reduction¶

The DimensionReduction class implements the concept of dimension reduction.

Principal component dimension reduction algorithm¶

The PCA class wraps the PCA from Scikit-learn.

Dependence¶

This dimension reduction algorithm relies on the PCA class of the scikit-learn library.

Transform data to improve the ML algorithm quality¶

Data transformer¶

Data transformer pipeline¶

Data scaler¶

Min-max data scaler¶

Standard data scaler¶

Dimension reduction¶

Principal component dimension reduction algorithm¶

Dependence¶

Development¶