Transform data to improve the ML algorithm quality¶
Data transformer¶
The abstract Transformer
class implements the concept of a data
transformer. Inheriting classes should implement the
Transformer.fit()
, Transformer.transform()
and
possibly Transformer.inverse_transform()
methods.
See also
Data transformer pipeline¶
The Pipeline
class chains a sequence of tranformers, and provides
global fit(), transform(), fit_transform() and inverse_transform() methods.
Data scaler¶
The Scaler
class implements the default scaling method applying to
some parameter \(z\):
where \(\bar{z}\) is the scaled version of z. This scaling method is a linear transformation parameterized by an offset and a coefficient.
In this default scaling method, the offset is equal to 0 and the coefficient is equal to 1. Consequently, the scaling operation is the identity: \(\bar{z}=z\). This method has to be overloaded.
See also
Min-max data scaler¶
The MinMaxScaler
class implements the MinMax scaling method applying
to some parameter \(z\):
where \(\text{offset}=-\text{min}(z)/(\text{max}(z)-\text{min}(z))\) and \(\text{coefficient}=1/(\text{max}(z)-\text{min}(z))\).
In the MinMax scaling method, the scaling operation linearly transforms the original variable \(z\) such that the minimum of the original data corresponds to 0 and the maximum to 1.
Standard data scaler¶
The StandardScaler
class implements the Standard scaling method
applying to some parameter \(z\):
where \(\text{offset}=-\text{mean}(z)/\text{std}(z)\) and \(\text{coefficient}=1/\text{std}(z)\).
In this Standard scaling method, the scaling operation linearly transforms the original variable math:z such that in the scaled space, the original data have zero mean and unit standard deviation.
Dimension reduction¶
The DimensionReduction
class implements the concept of dimension
reduction.
See also
Principal component dimension reduction algorithm¶
The PCA
class wraps the PCA from Scikit-learn.
Dependence¶
This dimension reduction algorithm relies on the PCA class of the scikit-learn library.