Transform data to improve the ML algorithm quality¶
A transformer to apply operations on NumPy arrays.
The abstract Transformer
class implements the concept of a data transformer.
Inheriting classes shall implement the Transformer.fit()
,
Transformer.transform()
and possibly Transformer.inverse_transform()
methods.
See also
- class gemseo.mlearning.transform.transformer.Transformer(name='Transformer', **parameters)[source]
Transformer baseclass.
- Parameters
- Return type
None
- compute_jacobian(data)[source]
Compute Jacobian of transformer.transform().
- Parameters
data (numpy.ndarray) – The data where the Jacobian is to be computed.
- Returns
The Jacobian matrix.
- Return type
NoReturn
- compute_jacobian_inverse(data)[source]
Compute Jacobian of the transformer.inverse_transform().
- Parameters
data (numpy.ndarray) – The data where the Jacobian is to be computed.
- Returns
The Jacobian matrix.
- Return type
NoReturn
- duplicate()[source]
Duplicate the current object.
- Returns
A deepcopy of the current instance.
- Return type
- fit(data, *args)[source]
Fit the transformer to the data.
- Parameters
data (numpy.ndarray) – The data to be fitted.
- Return type
NoReturn
- fit_transform(data, *args)[source]
Fit the transformer to the data and transform the data.
- Parameters
data (numpy.ndarray) – The data to be transformed.
- Returns
The transformed data.
- Return type
- inverse_transform(data)[source]
Perform an inverse transform on the data.
- Parameters
data (numpy.ndarray) – The data to be inverse transformed.
- Returns
The inverse transformed data.
- Return type
NoReturn
- transform(data)[source]
Transform the data.
- Parameters
data (numpy.ndarray) – The data to be transformed.
- Returns
The transformed data.
- Return type
NoReturn
- name: str
The name of the transformer.
- parameters: str
The parameters of the transformer.
- class gemseo.mlearning.transform.transformer.TransformerFactory(*args, **kwargs)[source]
A factory of
Transformer
.- Parameters
base_class – The base class to be considered.
module_names – The fully qualified modules names to be searched.
- Return type
None
- create(class_name, **options)
Return an instance of a class.
- get_class(name)
Return a class from its name.
- Parameters
name (str) – The name of the class.
- Returns
The class.
- Raises
ImportError – If the class is not available.
- Return type
- get_default_options_values(name)
Return the constructor kwargs default values of a class.
- get_default_sub_options_values(name, **options)
Return the default values of the sub options of a class.
- Parameters
- Returns
The JSON grammar.
- Return type
- get_library_name(name)
Return the name of the library related to the name of a class.
- get_options_doc(name)
Return the constructor documentation of a class.
- get_options_grammar(name, write_schema=False, schema_path=None)
Return the options JSON grammar for a class.
Attempt to generate a JSONGrammar from the arguments of the __init__ method of the class.
- Parameters
name (str) – The name of the class.
write_schema (bool) –
If True, write the JSON schema to a file.
By default it is set to False.
schema_path (str | None) –
The path to the JSON schema file. If None, the file is saved in the current directory in a file named after the name of the class.
By default it is set to None.
- Returns
The JSON grammar.
- Return type
- get_sub_options_grammar(name, **options)
Return the JSONGrammar of the sub options of a class.
- Parameters
- Returns
The JSON grammar.
- Return type
- is_available(name)
Return whether a class can be instantiated.
- update()
Search for the classes that can be instantiated.
- The search is done in the following order:
The fully qualified module names
The plugin packages
The packages from the environment variables
- Return type
None
A pipeline to chain transformers.
The Pipeline
class chains a sequence of tranformers, and provides global
fit(), transform(), fit_transform() and inverse_transform() methods.
- class gemseo.mlearning.transform.pipeline.Pipeline(name='Pipeline', transformers=None)[source]
Transformer pipeline.
- Parameters
name (str) –
A name for this pipeline.
By default it is set to Pipeline.
transformers (Sequence[Transformer] | None) –
A sequence of transformers to be chained. The transformers are chained in the order of appearance in the list, i.e. the first transformer is applied first. If transformers is an empty list or None, then the pipeline transformer behaves like an identity transformer.
By default it is set to None.
- Return type
None
- compute_jacobian(data)[source]
Compute the Jacobian of the
pipeline.transform()
.- Parameters
data (numpy.ndarray) – The data where the Jacobian is to be computed.
- Returns
The Jacobian matrix.
- Return type
- compute_jacobian_inverse(data)[source]
Compute the Jacobian of the
pipeline.inverse_transform()
.- Parameters
data (numpy.ndarray) – The data where the Jacobian is to be computed.
- Returns
The Jacobian matrix.
- Return type
- duplicate()[source]
Duplicate the current object.
- Returns
A deepcopy of the current instance.
- Return type
- fit(data, *args)
Fit the transformer to the data.
- Parameters
data (numpy.ndarray) – The data to be fitted.
- Return type
NoReturn
- fit_transform(data, *args)
Fit the transformer to the data and transform the data.
- Parameters
data (numpy.ndarray) – The data to be transformed.
- Returns
The transformed data.
- Return type
- inverse_transform(data)[source]
Perform an inverse transform on the data.
The data is inverse transformed sequentially, starting with the last transformer in the list.
- Parameters
data (numpy.ndarray) – The data to be inverse transformed.
- Returns
The inverse transformed data.
- Return type
- transform(data)[source]
Transform the data.
The data is transformed sequentially, where the output of one transformer is the input of the next.
- Parameters
data (numpy.ndarray) – The data to be transformed.
- Returns
The transformed data.
- Return type
- transformers: Sequence[gemseo.mlearning.transform.transformer.Transformer]
The sequence of transformers.
Scaling a variable with a linear transformation.
The Scaler
class implements the default scaling method
applying to some parameter \(z\):
where \(\bar{z}\) is the scaled version of \(z\). This scaling method is a linear transformation parameterized by an offset and a coefficient.
In this default scaling method, the offset is equal to 0 and the coefficient is equal to 1. Consequently, the scaling operation is the identity: \(\bar{z}=z\). This method has to be overloaded.
See also
- class gemseo.mlearning.transform.scaler.scaler.Scaler(name='Scaler', offset=0.0, coefficient=1.0)[source]
Data scaler.
- Parameters
- Return type
None
- compute_jacobian(data)[source]
Compute Jacobian of transformer.transform().
- Parameters
data (numpy.ndarray) – The data where the Jacobian is to be computed.
- Returns
The Jacobian matrix.
- Return type
- compute_jacobian_inverse(data)[source]
Compute Jacobian of the transformer.inverse_transform().
- Parameters
data (numpy.ndarray) – The data where the Jacobian is to be computed.
- Returns
The Jacobian matrix.
- Return type
- duplicate()
Duplicate the current object.
- Returns
A deepcopy of the current instance.
- Return type
- fit(data, *args)
Fit the transformer to the data.
- Parameters
data (numpy.ndarray) – The data to be fitted.
- Return type
NoReturn
- fit_transform(data, *args)
Fit the transformer to the data and transform the data.
- Parameters
data (numpy.ndarray) – The data to be transformed.
- Returns
The transformed data.
- Return type
- inverse_transform(data)[source]
Perform an inverse transform on the data.
- Parameters
data (numpy.ndarray) – The data to be inverse transformed.
- Returns
The inverse transformed data.
- Return type
- transform(data)[source]
Transform the data.
- Parameters
data (numpy.ndarray) – The data to be transformed.
- Returns
The transformed data.
- Return type
- property coefficient: float
The scaling coefficient.
- property offset: float
The scaling offset.
Scaling a variable with a geometrical linear transformation.
The MinMaxScaler
class implements the MinMax scaling method
applying to some parameter \(z\):
where \(\text{offset}=-\text{min}(z)/(\text{max}(z)-\text{min}(z))\) and \(\text{coefficient}=1/(\text{max}(z)-\text{min}(z))\).
In the MinMax scaling method, the scaling operation linearly transforms the original variable \(z\) such that the minimum of the original data corresponds to 0 and the maximum to 1.
- class gemseo.mlearning.transform.scaler.min_max_scaler.MinMaxScaler(name='MinMaxScaler', offset=0.0, coefficient=1.0)[source]
Min-max scaler.
- Parameters
- Return type
None
- compute_jacobian(data)
Compute Jacobian of transformer.transform().
- Parameters
data (numpy.ndarray) – The data where the Jacobian is to be computed.
- Returns
The Jacobian matrix.
- Return type
- compute_jacobian_inverse(data)
Compute Jacobian of the transformer.inverse_transform().
- Parameters
data (numpy.ndarray) – The data where the Jacobian is to be computed.
- Returns
The Jacobian matrix.
- Return type
- duplicate()
Duplicate the current object.
- Returns
A deepcopy of the current instance.
- Return type
- fit(data, *args)
Fit the transformer to the data.
- Parameters
data (numpy.ndarray) – The data to be fitted.
- Return type
NoReturn
- fit_transform(data, *args)
Fit the transformer to the data and transform the data.
- Parameters
data (numpy.ndarray) – The data to be transformed.
- Returns
The transformed data.
- Return type
- inverse_transform(data)
Perform an inverse transform on the data.
- Parameters
data (numpy.ndarray) – The data to be inverse transformed.
- Returns
The inverse transformed data.
- Return type
- transform(data)
Transform the data.
- Parameters
data (numpy.ndarray) – The data to be transformed.
- Returns
The transformed data.
- Return type
- property coefficient: float
The scaling coefficient.
- property offset: float
The scaling offset.
Scaling a variable with a statistical linear transformation.
The StandardScaler
class implements the Standard scaling method
applying to some parameter \(z\):
where \(\text{offset}=-\text{mean}(z)/\text{std}(z)\) and \(\text{coefficient}=1/\text{std}(z)\).
In this standard scaling method, the scaling operation linearly transforms the original variable math:z such that in the scaled space, the original data have zero mean and unit standard deviation.
- class gemseo.mlearning.transform.scaler.standard_scaler.StandardScaler(name='StandardScaler', offset=0.0, coefficient=1.0)[source]
Standard scaler.
- Parameters
- Return type
None
- compute_jacobian(data)
Compute Jacobian of transformer.transform().
- Parameters
data (numpy.ndarray) – The data where the Jacobian is to be computed.
- Returns
The Jacobian matrix.
- Return type
- compute_jacobian_inverse(data)
Compute Jacobian of the transformer.inverse_transform().
- Parameters
data (numpy.ndarray) – The data where the Jacobian is to be computed.
- Returns
The Jacobian matrix.
- Return type
- duplicate()
Duplicate the current object.
- Returns
A deepcopy of the current instance.
- Return type
- fit(data, *args)
Fit the transformer to the data.
- Parameters
data (numpy.ndarray) – The data to be fitted.
- Return type
NoReturn
- fit_transform(data, *args)
Fit the transformer to the data and transform the data.
- Parameters
data (numpy.ndarray) – The data to be transformed.
- Returns
The transformed data.
- Return type
- inverse_transform(data)
Perform an inverse transform on the data.
- Parameters
data (numpy.ndarray) – The data to be inverse transformed.
- Returns
The inverse transformed data.
- Return type
- transform(data)
Transform the data.
- Parameters
data (numpy.ndarray) – The data to be transformed.
- Returns
The transformed data.
- Return type
- property coefficient: float
The scaling coefficient.
- property offset: float
The scaling offset.
Dimension reduction as a generic transformer.
The DimensionReduction
class implements the concept of dimension reduction.
See also
- class gemseo.mlearning.transform.dimension_reduction.dimension_reduction.DimensionReduction(name='DimensionReduction', n_components=None, **parameters)[source]
Dimension reduction.
- Parameters
name (str) –
A name for this transformer.
By default it is set to DimensionReduction.
n_components (int | None) –
The number of components of the latent space. If
None
, use the maximum number allowed by the technique, typicallymin(n_samples, n_features)
.By default it is set to None.
**parameters (bool | int | float | ndarray | str | None) – The parameters of the transformer.
- Return type
None
- compute_jacobian(data)
Compute Jacobian of transformer.transform().
- Parameters
data (numpy.ndarray) – The data where the Jacobian is to be computed.
- Returns
The Jacobian matrix.
- Return type
NoReturn
- compute_jacobian_inverse(data)
Compute Jacobian of the transformer.inverse_transform().
- Parameters
data (numpy.ndarray) – The data where the Jacobian is to be computed.
- Returns
The Jacobian matrix.
- Return type
NoReturn
- duplicate()
Duplicate the current object.
- Returns
A deepcopy of the current instance.
- Return type
- fit(data, *args)
Fit the transformer to the data.
- Parameters
data (numpy.ndarray) – The data to be fitted.
- Return type
NoReturn
- fit_transform(data, *args)
Fit the transformer to the data and transform the data.
- Parameters
data (numpy.ndarray) – The data to be transformed.
- Returns
The transformed data.
- Return type
- inverse_transform(data)
Perform an inverse transform on the data.
- Parameters
data (numpy.ndarray) – The data to be inverse transformed.
- Returns
The inverse transformed data.
- Return type
NoReturn
- transform(data)
Transform the data.
- Parameters
data (numpy.ndarray) – The data to be transformed.
- Returns
The transformed data.
- Return type
NoReturn
- property n_components: int
The number of components.
The Principal Component Analysis (PCA) to reduce the dimension of a variable.
The PCA
class wraps the PCA from Scikit-learn.
Dependence¶
This dimension reduction algorithm relies on the PCA class of the scikit-learn library.
- class gemseo.mlearning.transform.dimension_reduction.pca.PCA(name='PCA', n_components=None, **parameters)[source]
Principal component dimension reduction algorithm.
- Parameters
name (str) –
A name for this transformer.
By default it is set to PCA.
n_components (int | None) –
The number of components of the latent space. If
None
, use the maximum number allowed by the technique, typicallymin(n_samples, n_features)
.By default it is set to None.
**parameters (float | int | str | bool | None) – The optional parameters for sklearn PCA constructor.
- Return type
None
- compute_jacobian(data)[source]
Compute Jacobian of transformer.transform().
- Parameters
data (numpy.ndarray) – The data where the Jacobian is to be computed.
- Returns
The Jacobian matrix.
- Return type
- compute_jacobian_inverse(data)[source]
Compute Jacobian of the transformer.inverse_transform().
- Parameters
data (numpy.ndarray) – The data where the Jacobian is to be computed.
- Returns
The Jacobian matrix.
- Return type
- duplicate()
Duplicate the current object.
- Returns
A deepcopy of the current instance.
- Return type
- fit(data, *args)
Fit the transformer to the data.
- Parameters
data (numpy.ndarray) – The data to be fitted.
- Return type
NoReturn
- fit_transform(data, *args)
Fit the transformer to the data and transform the data.
- Parameters
data (numpy.ndarray) – The data to be transformed.
- Returns
The transformed data.
- Return type
- inverse_transform(data)[source]
Perform an inverse transform on the data.
- Parameters
data (numpy.ndarray) – The data to be inverse transformed.
- Returns
The inverse transformed data.
- Return type
- transform(data)[source]
Transform the data.
- Parameters
data (numpy.ndarray) – The data to be transformed.
- Returns
The transformed data.
- Return type
- property components: numpy.ndarray
The principal components.
- property n_components: int
The number of components.