Transform data to improve the ML algorithm quality¶

A transformer to apply operations on NumPy arrays.

The abstract Transformer class implements the concept of a data transformer. Inheriting classes shall implement the Transformer.fit(), Transformer.transform() and possibly Transformer.inverse_transform() methods.

See also

scaler dimension_reduction

Classes:

Transformer([name])

Transformer baseclass.

class gemseo.mlearning.transform.transformer.Transformer(name='Transformer', **parameters)[source]

Transformer baseclass.

name

The name of the transformer.

Type: str

parameters

The parameters of the transformer.

Type: str

Parameters

name (str) –
A name for this transformer.

By default it is set to Transformer.
**parameters (Optional[Union[float,int,str,bool]]) – The parameters of the transformer.

Return type

None

Methods:

`compute_jacobian`(data)	Compute Jacobian of transformer.transform().
`compute_jacobian_inverse`(data)	Compute Jacobian of the transformer.inverse_transform().
`duplicate`()	Duplicate the current object.
`fit`(data, *args)	Fit the transformer to the data.
`fit_transform`(data, *args)	Fit the transformer to the data and transform the data.
`inverse_transform`(data)	Perform an inverse transform on the data.
`transform`(data)	Transform the data.

compute_jacobian(data)[source]

Compute Jacobian of transformer.transform().

Parameters: data (numpy.ndarray) – The data where the Jacobian is to be computed.
Returns: The Jacobian matrix.
Return type: NoReturn

compute_jacobian_inverse(data)[source]

Compute Jacobian of the transformer.inverse_transform().

Parameters: data (numpy.ndarray) – The data where the Jacobian is to be computed.
Returns: The Jacobian matrix.
Return type: NoReturn

duplicate()[source]

Duplicate the current object.

Returns: A deepcopy of the current instance.
Return type: gemseo.mlearning.transform.transformer.Transformer

fit(data, *args)[source]

Fit the transformer to the data.

Parameters

data (numpy.ndarray) – The data to be fitted.
*args (Union[float, int, str]) –

Return type

NoReturn

fit_transform(data, *args)[source]

Fit the transformer to the data and transform the data.

Parameters

data (numpy.ndarray) – The data to be transformed.
*args (Union[float, int, str]) –

Returns

The transformed data.

Return type

numpy.ndarray

inverse_transform(data)[source]

Perform an inverse transform on the data.

Parameters: data (numpy.ndarray) – The data to be inverse transformed.
Returns: The inverse transformed data.
Return type: NoReturn

transform(data)[source]

Transform the data.

Parameters: data (numpy.ndarray) – The data to be transformed.
Returns: The transformed data.
Return type: NoReturn

A pipeline to chain transformers.

The Pipeline class chains a sequence of tranformers, and provides global fit(), transform(), fit_transform() and inverse_transform() methods.

Classes:

Pipeline([name, transformers])

Transformer pipeline.

class gemseo.mlearning.transform.pipeline.Pipeline(name='Pipeline', transformers=None)[source]

Transformer pipeline.

name

The name of the transformer.

Type: str

parameters

The parameters of the transformer.

Type: str

transformers

The sequence of transformers.

Type: Sequence(Transformer)

Parameters

name (str) –
A name for this pipeline.

By default it is set to Pipeline.
transformers (Optional[Sequence[Transformer]]) –
A sequence of transformers to be chained. The transformers are chained in the order of appearance in the list, i.e. the first transformer is applied first. If transformers is an empty list or None, then the pipeline transformer behaves like an identity transformer.

By default it is set to None.

Return type

None

Methods:

`compute_jacobian`(data)	Compute the Jacobian of the `pipeline.transform()`.
`compute_jacobian_inverse`(data)	Compute the Jacobian of the `pipeline.inverse_transform()`.
`duplicate`()	Duplicate the current object.
`fit`(data, **options)	Fit the transformer pipeline to the data.
`fit_transform`(data, *args)	Fit the transformer to the data and transform the data.
`inverse_transform`(data)	Perform an inverse transform on the data.
`transform`(data)	Transform the data.

compute_jacobian(data)[source]

Compute the Jacobian of the pipeline.transform().

Parameters: data (numpy.ndarray) – The data where the Jacobian is to be computed.
Returns: The Jacobian matrix.
Return type: numpy.ndarray

compute_jacobian_inverse(data)[source]

Compute the Jacobian of the pipeline.inverse_transform().

Parameters: data (numpy.ndarray) – The data where the Jacobian is to be computed.
Returns: The Jacobian matrix.
Return type: numpy.ndarray

duplicate()[source]

Duplicate the current object.

Returns: A deepcopy of the current instance.
Return type: gemseo.mlearning.transform.pipeline.Pipeline

fit(data, **options)[source]

Fit the transformer pipeline to the data.

All the transformers are fitted, transforming the data in place.

Parameters

data (numpy.ndarray) – The data to be fitted.
**options (Union[float, int, str]) –

Return type

None

fit_transform(data, *args)

Fit the transformer to the data and transform the data.

Parameters

data (numpy.ndarray) – The data to be transformed.
*args (Union[float, int, str]) –

Returns

The transformed data.

Return type

numpy.ndarray

inverse_transform(data)[source]

Perform an inverse transform on the data.

The data is inverse transformed sequentially, starting with the last transformer in the list.

Parameters: data (numpy.ndarray) – The data to be inverse transformed.
Returns: The inverse transformed data.
Return type: numpy.ndarray

transform(data)[source]

Transform the data.

The data is transformed sequentially, where the output of one transformer is the input of the next.

Parameters: data (numpy.ndarray) – The data to be transformed.
Returns: The transformed data.
Return type: numpy.ndarray

Scaling a variable with a linear transformation.

The Scaler class implements the default scaling method applying to some parameter \(z\):

\[\bar{z} := \text{offset} + \text{coefficient}\times z\]

where \(\bar{z}\) is the scaled version of \(z\). This scaling method is a linear transformation parameterized by an offset and a coefficient.

In this default scaling method, the offset is equal to 0 and the coefficient is equal to 1. Consequently, the scaling operation is the identity: \(\bar{z}=z\). This method has to be overloaded.

See also

min_max_scaler standard_scaler

Classes:

Scaler([name, offset, coefficient])

Data scaler.

class gemseo.mlearning.transform.scaler.scaler.Scaler(name='Scaler', offset=0.0, coefficient=1.0)[source]

Data scaler.

name

The name of the transformer.

Type: str

parameters

The parameters of the transformer.

Type: str

Parameters

name (str) –
A name for this transformer.

By default it is set to Scaler.
offset (float) –
The offset of the linear transformation.

By default it is set to 0.0.
coefficient (float) –
The coefficient of the linear transformation.

By default it is set to 1.0.

Return type

None

Attributes:

`coefficient`	The scaling coefficient.
`offset`	The scaling offset.

Methods:

`compute_jacobian`(data)	Compute Jacobian of transformer.transform().
`compute_jacobian_inverse`(data)	Compute Jacobian of the transformer.inverse_transform().
`duplicate`()	Duplicate the current object.
`fit`(data, *args)	Fit the transformer to the data.
`fit_transform`(data, *args)	Fit the transformer to the data and transform the data.
`inverse_transform`(data)	Perform an inverse transform on the data.
`transform`(data)	Transform the data.

property coefficient: The scaling coefficient.

compute_jacobian(data)[source]

Compute Jacobian of transformer.transform().

Parameters: data (numpy.ndarray) – The data where the Jacobian is to be computed.
Returns: The Jacobian matrix.
Return type: numpy.ndarray

compute_jacobian_inverse(data)[source]

Compute Jacobian of the transformer.inverse_transform().

Parameters: data (numpy.ndarray) – The data where the Jacobian is to be computed.
Returns: The Jacobian matrix.
Return type: numpy.ndarray

duplicate()

Duplicate the current object.

Returns: A deepcopy of the current instance.
Return type: gemseo.mlearning.transform.transformer.Transformer

fit(data, *args)[source]

Fit the transformer to the data.

Parameters

data (numpy.ndarray) – The data to be fitted.
*args (Union[float, int, str]) –

Return type

None

fit_transform(data, *args)

Fit the transformer to the data and transform the data.

Parameters

data (numpy.ndarray) – The data to be transformed.
*args (Union[float, int, str]) –

Returns

The transformed data.

Return type

numpy.ndarray

inverse_transform(data)[source]

Perform an inverse transform on the data.

Parameters: data (numpy.ndarray) – The data to be inverse transformed.
Returns: The inverse transformed data.
Return type: numpy.ndarray

property offset: The scaling offset.

transform(data)[source]

Transform the data.

Parameters: data (numpy.ndarray) – The data to be transformed.
Returns: The transformed data.
Return type: numpy.ndarray

Examples

Scaling a variable with a geometrical linear transformation.

The MinMaxScaler class implements the MinMax scaling method applying to some parameter \(z\):

\[\bar{z} := \text{offset} + \text{coefficient}\times z = \frac{z-\text{min}(z)}{(\text{max}(z)-\text{min}(z))},\]

where \(\text{offset}=-\text{min}(z)/(\text{max}(z)-\text{min}(z))\) and \(\text{coefficient}=1/(\text{max}(z)-\text{min}(z))\).

In the MinMax scaling method, the scaling operation linearly transforms the original variable \(z\) such that the minimum of the original data corresponds to 0 and the maximum to 1.

Classes:

MinMaxScaler([name, offset, coefficient])

Min-max scaler.

class gemseo.mlearning.transform.scaler.min_max_scaler.MinMaxScaler(name='MinMaxScaler', offset=0.0, coefficient=1.0)[source]

Min-max scaler.

name

The name of the transformer.

Type: str

parameters

The parameters of the transformer.

Type: str

Parameters

name (str) –
A name for this transformer.

By default it is set to MinMaxScaler.
offset (float) –
The offset of the linear transformation.

By default it is set to 0.0.
coefficient (float) –
The coefficient of the linear transformation.

By default it is set to 1.0.

Return type

None

Attributes:

`coefficient`	The scaling coefficient.
`offset`	The scaling offset.

Methods:

`compute_jacobian`(data)	Compute Jacobian of transformer.transform().
`compute_jacobian_inverse`(data)	Compute Jacobian of the transformer.inverse_transform().
`duplicate`()	Duplicate the current object.
`fit`(data, *args)	Fit the transformer to the data.
`fit_transform`(data, *args)	Fit the transformer to the data and transform the data.
`inverse_transform`(data)	Perform an inverse transform on the data.
`transform`(data)	Transform the data.

property coefficient: The scaling coefficient.

compute_jacobian(data)

Compute Jacobian of transformer.transform().

Parameters: data (numpy.ndarray) – The data where the Jacobian is to be computed.
Returns: The Jacobian matrix.
Return type: numpy.ndarray

compute_jacobian_inverse(data)

Compute Jacobian of the transformer.inverse_transform().

Parameters: data (numpy.ndarray) – The data where the Jacobian is to be computed.
Returns: The Jacobian matrix.
Return type: numpy.ndarray

duplicate()

Duplicate the current object.

Returns: A deepcopy of the current instance.
Return type: gemseo.mlearning.transform.transformer.Transformer

fit(data, *args)[source]

Fit the transformer to the data.

Parameters

data (numpy.ndarray) – The data to be fitted.
*args (Union[float, int, str]) –

Return type

None

fit_transform(data, *args)

Fit the transformer to the data and transform the data.

Parameters

data (numpy.ndarray) – The data to be transformed.
*args (Union[float, int, str]) –

Returns

The transformed data.

Return type

numpy.ndarray

inverse_transform(data)

Perform an inverse transform on the data.

Parameters: data (numpy.ndarray) – The data to be inverse transformed.
Returns: The inverse transformed data.
Return type: numpy.ndarray

property offset: The scaling offset.

transform(data)

Transform the data.

Parameters: data (numpy.ndarray) – The data to be transformed.
Returns: The transformed data.
Return type: numpy.ndarray

Scaling a variable with a statistical linear transformation.

The StandardScaler class implements the Standard scaling method applying to some parameter \(z\):

\[\bar{z} := \text{offset} + \text{coefficient}\times z = \frac{z-\text{mean}(z)}{\text{std}(z)}\]

where \(\text{offset}=-\text{mean}(z)/\text{std}(z)\) and \(\text{coefficient}=1/\text{std}(z)\).

In this standard scaling method, the scaling operation linearly transforms the original variable math:z such that in the scaled space, the original data have zero mean and unit standard deviation.

Classes:

StandardScaler([name, offset, coefficient])

Standard scaler.

class gemseo.mlearning.transform.scaler.standard_scaler.StandardScaler(name='StandardScaler', offset=0.0, coefficient=1.0)[source]

Standard scaler.

name

The name of the transformer.

Type: str

parameters

The parameters of the transformer.

Type: str

Parameters

name (str) –
A name for this transformer.

By default it is set to StandardScaler.
offset (float) –
The offset of the linear transformation.

By default it is set to 0.0.
coefficient (float) –
The coefficient of the linear transformation.

By default it is set to 1.0.

Return type

None

Attributes:

`coefficient`	The scaling coefficient.
`offset`	The scaling offset.

Methods:

`compute_jacobian`(data)	Compute Jacobian of transformer.transform().
`compute_jacobian_inverse`(data)	Compute Jacobian of the transformer.inverse_transform().
`duplicate`()	Duplicate the current object.
`fit`(data, *args)	Fit the transformer to the data.
`fit_transform`(data, *args)	Fit the transformer to the data and transform the data.
`inverse_transform`(data)	Perform an inverse transform on the data.
`transform`(data)	Transform the data.

property coefficient: The scaling coefficient.

compute_jacobian(data)

Compute Jacobian of transformer.transform().

Parameters: data (numpy.ndarray) – The data where the Jacobian is to be computed.
Returns: The Jacobian matrix.
Return type: numpy.ndarray

compute_jacobian_inverse(data)

Compute Jacobian of the transformer.inverse_transform().

Parameters: data (numpy.ndarray) – The data where the Jacobian is to be computed.
Returns: The Jacobian matrix.
Return type: numpy.ndarray

duplicate()

Duplicate the current object.

Returns: A deepcopy of the current instance.
Return type: gemseo.mlearning.transform.transformer.Transformer

fit(data, *args)[source]

Fit the transformer to the data.

Parameters

data (numpy.ndarray) – The data to be fitted.
*args (Union[float, int, str]) –

Return type

None

fit_transform(data, *args)

Fit the transformer to the data and transform the data.

Parameters

data (numpy.ndarray) – The data to be transformed.
*args (Union[float, int, str]) –

Returns

The transformed data.

Return type

numpy.ndarray

inverse_transform(data)

Perform an inverse transform on the data.

Parameters: data (numpy.ndarray) – The data to be inverse transformed.
Returns: The inverse transformed data.
Return type: numpy.ndarray

property offset: The scaling offset.

transform(data)

Transform the data.

Parameters: data (numpy.ndarray) – The data to be transformed.
Returns: The transformed data.
Return type: numpy.ndarray

Dimension reduction as a generic transformer.

The DimensionReduction class implements the concept of dimension reduction.

Dependence¶

This dimension reduction algorithm relies on the PCA class of the scikit-learn library.

Classes:

PCA([name, n_components])

Principal component dimension reduction algorithm.

class gemseo.mlearning.transform.dimension_reduction.pca.PCA(name='PCA', n_components=5, **parameters)[source]

Principal component dimension reduction algorithm.

name

The name of the transformer.

Type: str

parameters

The parameters of the transformer.

Type: str

Parameters

**parameters (Optional[Union[float,int,str,bool]]) – The optional parameters for sklearn PCA constructor.
name (str,) –

By default it is set to PCA.
n_components (int) –

By default it is set to 5.

Return type

None

Attributes:

`components`	The principal components.
`n_components`	The number of components.

Methods:

`compute_jacobian`(data)	Compute Jacobian of transformer.transform().
`compute_jacobian_inverse`(data)	Compute Jacobian of the transformer.inverse_transform().
`duplicate`()	Duplicate the current object.
`fit`(data, *args)	Fit the transformer to the data.
`fit_transform`(data, *args)	Fit the transformer to the data and transform the data.
`inverse_transform`(data)	Perform an inverse transform on the data.
`transform`(data)	Transform the data.

property components: The principal components.

compute_jacobian(data)[source]

Compute Jacobian of transformer.transform().

Parameters: data (numpy.ndarray) – The data where the Jacobian is to be computed.
Returns: The Jacobian matrix.
Return type: numpy.ndarray

compute_jacobian_inverse(data)[source]

Compute Jacobian of the transformer.inverse_transform().

Parameters: data (numpy.ndarray) – The data where the Jacobian is to be computed.
Returns: The Jacobian matrix.
Return type: numpy.ndarray

duplicate()

Duplicate the current object.

Returns: A deepcopy of the current instance.
Return type: gemseo.mlearning.transform.transformer.Transformer

fit(data, *args)[source]

Fit the transformer to the data.

Parameters

data (numpy.ndarray) – The data to be fitted.
*args (Union[float, int, str]) –

Return type

None

fit_transform(data, *args)

Fit the transformer to the data and transform the data.

Parameters

data (numpy.ndarray) – The data to be transformed.
*args (Union[float, int, str]) –

Returns

The transformed data.

Return type

numpy.ndarray

inverse_transform(data)[source]

Perform an inverse transform on the data.

Parameters: data (numpy.ndarray) – The data to be inverse transformed.
Returns: The inverse transformed data.
Return type: numpy.ndarray

property n_components: The number of components.

transform(data)[source]

Transform the data.

Parameters: data (numpy.ndarray) – The data to be transformed.
Returns: The transformed data.
Return type: numpy.ndarray

Transform data to improve the ML algorithm quality¶

Dependence¶

Development¶