Transform data to improve the ML algorithm quality¶
Introduction¶
A pipeline to chain transformers.
The Pipeline
class chains a sequence of tranformers, and provides global
fit(), transform(), fit_transform() and inverse_transform() methods.
- class gemseo.mlearning.transformers.pipeline.Pipeline(name='', transformers=None)[source]
Transformer pipeline.
- Parameters:
name (str) –
A name for this pipeline.
By default it is set to “”.
transformers (Sequence[Transformer] | None) – A sequence of transformers to be chained. The transformers are chained in the order of appearance in the list, i.e. the first transformer is applied first. If transformers is an empty list or None, then the pipeline transformer behaves like an identity transformer.
- compute_jacobian(data)[source]
Compute the Jacobian of the
pipeline.transform()
.
- compute_jacobian_inverse(data)[source]
Compute the Jacobian of the
pipeline.inverse_transform()
.
- duplicate()[source]
Duplicate the current object.
- Returns:
A deepcopy of the current instance.
- Return type:
- fit(data, *args)
Fit the transformer to the data.
- fit_transform(data, *args)
Fit the transformer to the data and transform the data.
- inverse_transform(data)[source]
Perform an inverse transform on the data.
The data is inverse transformed sequentially, starting with the last transformer in the list.
- transform(data)[source]
Transform the data.
The data is transformed sequentially, where the output of one transformer is the input of the next.
- property is_fitted: bool
Whether the transformer has been fitted from some data.
- name: str
The name of the transformer.
- property parameters: dict[str, bool | int | float | ndarray | str | None]
The parameters of the transformer.
- transformers: Sequence[Transformer]
The sequence of transformers.
Scaling¶
Scaling a variable with a linear transformation.
The Scaler
class implements the default scaling method
applying to some parameter \(z\):
where \(\\bar{z}\) is the scaled version of \(z\). This scaling method is a linear transformation parameterized by an offset and a coefficient.
In this default scaling method, the offset is equal to 0 and the coefficient is equal to 1. Consequently, the scaling operation is the identity: \(\\bar{z}=z\). This method has to be overloaded.
See also
- class gemseo.mlearning.transformers.scaler.scaler.Scaler(name='', offset=0.0, coefficient=1.0)[source]
Data scaler.
- Parameters:
- compute_jacobian(data, *args, **kwargs)
Force a NumPy array to be 2D and evaluate the function
f
with it.
- compute_jacobian_inverse(data, *args, **kwargs)
Force a NumPy array to be 2D and evaluate the function
f
with it.
- duplicate()
Duplicate the current object.
- Returns:
A deepcopy of the current instance.
- Return type:
- fit(data, *args)
Fit the transformer to the data.
- fit_transform(data, *args)
Fit the transformer to the data and transform the data.
- inverse_transform(data, *args, **kwargs)
Force a NumPy array to be 2D and evaluate the function
f
with it.
- transform(data, *args, **kwargs)
Force a NumPy array to be 2D and evaluate the function
f
with it.
- property coefficient: ndarray
The scaling coefficient.
- property is_fitted: bool
Whether the transformer has been fitted from some data.
- name: str
The name of the transformer.
- property offset: ndarray
The scaling offset.
Scaling a variable with a geometrical linear transformation.
The MinMaxScaler
class implements the MinMax scaling method
applying to some parameter \(z\):
where \(\text{offset}=-\text{min}(z)/(\text{max}(z)-\text{min}(z))\) and \(\text{coefficient}=1/(\text{max}(z)-\text{min}(z))\).
In the MinMax scaling method, the scaling operation linearly transforms the original variable \(z\) such that the minimum of the original data corresponds to 0 and the maximum to 1.
Warning
When \(\text{min}(z)=\text{max}(z)\neq 0\), we use \(\bar{z}=\frac{z}{\text{min}(z)}-0.5\). When \(\text{min}(z)=\text{max}(z)=0\), we use \(\bar{z}=z+0.5\).
- class gemseo.mlearning.transformers.scaler.min_max_scaler.MinMaxScaler(name='', offset=0.0, coefficient=1.0)[source]
Min-max scaler.
- Parameters:
- compute_jacobian(data, *args, **kwargs)
Force a NumPy array to be 2D and evaluate the function
f
with it.
- compute_jacobian_inverse(data, *args, **kwargs)
Force a NumPy array to be 2D and evaluate the function
f
with it.
- duplicate()
Duplicate the current object.
- Returns:
A deepcopy of the current instance.
- Return type:
- fit(data, *args)
Fit the transformer to the data.
- fit_transform(data, *args)
Fit the transformer to the data and transform the data.
- inverse_transform(data, *args, **kwargs)
Force a NumPy array to be 2D and evaluate the function
f
with it.
- transform(data, *args, **kwargs)
Force a NumPy array to be 2D and evaluate the function
f
with it.
- property coefficient: ndarray
The scaling coefficient.
- property is_fitted: bool
Whether the transformer has been fitted from some data.
- name: str
The name of the transformer.
- property offset: ndarray
The scaling offset.
Scaling a variable with a statistical linear transformation.
The StandardScaler
class implements the Standard scaling method
applying to some parameter \(z\):
where \(\text{offset}=-\text{mean}(z)/\text{std}(z)\) and \(\text{coefficient}=1/\text{std}(z)\).
In this standard scaling method, the scaling operation linearly transforms the original variable math:z such that in the scaled space, the original data have zero mean and unit standard deviation.
Warning
When \(\text{std}(z)=0\) and \(\text{mean}(z)\neq 0\), we use \(\bar{z}=\frac{z}{\text{mean}(z)}-1\). When \(\text{std}(z)=0\) and \(\text{mean}(z)=0\), we use \(\bar{z}=z\).
- class gemseo.mlearning.transformers.scaler.standard_scaler.StandardScaler(name='', offset=0.0, coefficient=1.0)[source]
Standard scaler.
- Parameters:
- compute_jacobian(data, *args, **kwargs)
Force a NumPy array to be 2D and evaluate the function
f
with it.
- compute_jacobian_inverse(data, *args, **kwargs)
Force a NumPy array to be 2D and evaluate the function
f
with it.
- duplicate()
Duplicate the current object.
- Returns:
A deepcopy of the current instance.
- Return type:
- fit(data, *args)
Fit the transformer to the data.
- fit_transform(data, *args)
Fit the transformer to the data and transform the data.
- inverse_transform(data, *args, **kwargs)
Force a NumPy array to be 2D and evaluate the function
f
with it.
- transform(data, *args, **kwargs)
Force a NumPy array to be 2D and evaluate the function
f
with it.
- property coefficient: ndarray
The scaling coefficient.
- property is_fitted: bool
Whether the transformer has been fitted from some data.
- name: str
The name of the transformer.
- property offset: ndarray
The scaling offset.
Dimension reduction¶
Dimension reduction as a generic transformer.
The DimensionReduction
class implements the concept of dimension reduction.
See also
- class gemseo.mlearning.transformers.dimension_reduction.dimension_reduction.DimensionReduction(name='', n_components=None, **parameters)[source]
Dimension reduction.
- Parameters:
name (str) –
A name for this transformer.
By default it is set to “”.
n_components (int | None) – The number of components of the latent space. If
None
, use the maximum number allowed by the technique, typicallymin(n_samples, n_features)
.**parameters (bool | int | float | str | None) – The parameters of the transformer.
- compute_jacobian(data)
Compute the Jacobian of
transform()
.
- compute_jacobian_inverse(data)
Compute the Jacobian of the
inverse_transform()
.
- duplicate()
Duplicate the current object.
- Returns:
A deepcopy of the current instance.
- Return type:
- fit(data, *args)
Fit the transformer to the data.
- fit_transform(data, *args)
Fit the transformer to the data and transform the data.
- inverse_transform(data)
Perform an inverse transform on the data.
- abstract transform(data)
Transform the data.
- property is_fitted: bool
Whether the transformer has been fitted from some data.
- property n_components: int
The number of components.
- name: str
The name of the transformer.
The Principal Component Analysis (PCA) to reduce the dimension of a variable.
The PCA
class wraps the PCA from Scikit-learn.
Dependence¶
This dimension reduction algorithm relies on the PCA class of the scikit-learn library.
- class gemseo.mlearning.transformers.dimension_reduction.pca.PCA(name='', n_components=None, scale=False, **parameters)[source]
Principal component dimension reduction algorithm.
- Parameters:
name (str) –
A name for this transformer.
By default it is set to “”.
n_components (int | None) – The number of components of the latent space. If
None
, use the maximum number allowed by the technique, typicallymin(n_samples, n_features)
.scale (bool) –
Whether to scale the data before applying the PCA.
By default it is set to False.
**parameters (float | int | str | bool | None) – The optional parameters for sklearn PCA constructor.
- compute_jacobian(data, *args, **kwargs)
Force a NumPy array to be 2D and evaluate the function
f
with it.
- compute_jacobian_inverse(data, *args, **kwargs)
Force a NumPy array to be 2D and evaluate the function
f
with it.
- duplicate()
Duplicate the current object.
- Returns:
A deepcopy of the current instance.
- Return type:
- fit(data, *args)
Fit the transformer to the data.
- fit_transform(data, *args)
Fit the transformer to the data and transform the data.
- inverse_transform(data, *args, **kwargs)
Force a NumPy array to be 2D and evaluate the function
f
with it.
- transform(data, *args, **kwargs)
Force a NumPy array to be 2D and evaluate the function
f
with it.
- property components: ndarray
The principal components.
- property data_is_scaled: bool
Whether the transformer scales the data before reducing its dimension.
- property is_fitted: bool
Whether the transformer has been fitted from some data.
- property n_components: int
The number of components.
- name: str
The name of the transformer.
Examples¶
See the examples about: