# Transform data to improve the ML algorithm quality¶

## Data transformer¶

The abstract Transformer class implements the concept of a data transformer. Inheriting classes should implement the Transformer.fit(), Transformer.transform() and possibly Transformer.inverse_transform() methods.

## Data transformer pipeline¶

The Pipeline class chains a sequence of tranformers, and provides global fit(), transform(), fit_transform() and inverse_transform() methods.

## Data scaler¶

The Scaler class implements the default scaling method applying to some parameter $$z$$:

$\bar{z} := \text{offset} + \text{coefficient}\times z$

where $$\bar{z}$$ is the scaled version of z. This scaling method is a linear transformation parameterized by an offset and a coefficient.

In this default scaling method, the offset is equal to 0 and the coefficient is equal to 1. Consequently, the scaling operation is the identity: $$\bar{z}=z$$. This method has to be overloaded.

## Min-max data scaler¶

The MinMaxScaler class implements the MinMax scaling method applying to some parameter $$z$$:

$\bar{z} := \text{offset} + \text{coefficient}\times z = \frac{z-\text{min}(z)}{(\text{max}(z)-\text{min}(z))},$

where $$\text{offset}=-\text{min}(z)/(\text{max}(z)-\text{min}(z))$$ and $$\text{coefficient}=1/(\text{max}(z)-\text{min}(z))$$.

In the MinMax scaling method, the scaling operation linearly transforms the original variable $$z$$ such that the minimum of the original data corresponds to 0 and the maximum to 1.

## Standard data scaler¶

The StandardScaler class implements the Standard scaling method applying to some parameter $$z$$:

$\bar{z} := \text{offset} + \text{coefficient}\times z = \frac{z-\text{mean}(z)}{\text{std}(z)}$

where $$\text{offset}=-\text{mean}(z)/\text{std}(z)$$ and $$\text{coefficient}=1/\text{std}(z)$$.

In this Standard scaling method, the scaling operation linearly transforms the original variable math:z such that in the scaled space, the original data have zero mean and unit standard deviation.

## Dimension reduction¶

The DimensionReduction class implements the concept of dimension reduction.

## Principal component dimension reduction algorithm¶

The PCA class wraps the PCA from Scikit-learn.

### Dependence¶

This dimension reduction algorithm relies on the PCA class of the scikit-learn library.