Scalers¶

from __future__ import annotations

import matplotlib.pyplot as plt
from numpy import linspace

from gemseo.mlearning.transformers.scaler.min_max_scaler import MinMaxScaler
from gemseo.mlearning.transformers.scaler.scaler import Scaler
from gemseo.mlearning.transformers.scaler.standard_scaler import StandardScaler

Scaling data may be important, as discussed in another example. Different scalers are available and this example illustrate them with these simple data:

data = linspace(-2, 2, 100)

First, a Scaler transforms a value \(x\) into a new value \(\tilde{x}\) based on the linear function \(\tilde{x}=a+bx\). By default, the offset \(a\) is zero and the coefficient \(b\) is one:

default_scaler = Scaler()

We can set these coefficient and offset at instantiation:

custom_scaler = Scaler(offset=-1, coefficient=0.5)

or use a specific Scaler for that, e.g. a MinMaxScaler:

min_max_scaler = MinMaxScaler()

or a StandardScaler:

standard_scaler = StandardScaler()

In this case, the coefficient and offset will be computed from data.

Now, we fit each scaler from data and transform these data:

same_data = default_scaler.fit_transform(data)
scaled_data = custom_scaler.fit_transform(data)
min_max_scaled_data = min_max_scaler.fit_transform(data)
standard_scaled_data = standard_scaler.fit_transform(data)

#
# We can plot the transformed data versus the original one:
plt.plot(data, default_scaler.fit_transform(data), label="Default scaler")
plt.plot(data, custom_scaler.fit_transform(data), label="Custom scaler")
plt.plot(data, min_max_scaler.fit_transform(data), label="Min-max scaler")
plt.plot(data, standard_scaler.fit_transform(data), label="Standard scaler")
plt.legend()
plt.grid()
plt.show()

The specific features of the different scalers are clearly visible. In particular, the MinMaxScaler projects the data onto the interval \([0,1]\) as long as this data is included in the fitting interval. The StandardScaler guarantees that the transformed data have zero mean and unit variance.

Lastly, every scaler can compute the Jacobian, e.g.

custom_scaler.compute_jacobian(data)

array([[0.5]])

Total running time of the script: (0 minutes 0.169 seconds)

Gallery generated by Sphinx-Gallery