
# Scaling


In [None]:
from gemseo.algos.design_space import DesignSpace
from gemseo.algos.doe.lib_openturns import OpenTURNS
from gemseo.algos.opt_problem import OptimizationProblem
from gemseo.core.mdofunctions.mdo_function import MDOFunction
from gemseo.mlearning.quality_measures.r2_measure import R2Measure
from gemseo.mlearning.regression.gpr import GaussianProcessRegressor
from gemseo.problems.analytical.rosenbrock import Rosenbrock

Scaling data around zero is important to avoid numerical issues
when fitting a machine learning model.
This is all the more true as
the variables have different ranges
or the fitting relies on numerical optimization techniques.
This example illustrates the latter point.

First,
we consider the Rosenbrock function $f(x)=(1-x_1)^2+100(x_2-x_1^2)^2$
over the domain $[-2,2]^2$:



In [None]:
problem = Rosenbrock()

In order to approximate this function with a regression model,
we sample it 30 times with an optimized Latin hypercube sampling (LHS) technique



In [None]:
openturns = OpenTURNS()
openturns.execute(problem, openturns.OT_LHSO, n_samples=30)

and save the samples in an :class:`IODataset`:



In [None]:
dataset_train = problem.to_dataset(opt_naming=False)

We do the same with a full-factorial design of experiments (DOE) of size 900:



In [None]:
openturns.execute(problem, openturns.OT_FULLFACT, n_samples=30 * 30)
dataset_test = problem.to_dataset(opt_naming=False)

Then,
we create a first Gaussian process regressor from the training dataset:



In [None]:
gpr = GaussianProcessRegressor(dataset_train)
gpr.learn()

and compute its R2 quality from the test dataset:



In [None]:
r2 = R2Measure(gpr)
r2.compute_test_measure(dataset_test)

Then,
we create a second Gaussian process regressor from the training dataset
with the default input and output transformers that are :class:`.MinMaxScaler`:



In [None]:
gpr = GaussianProcessRegressor(
    dataset_train, transformer=GaussianProcessRegressor.DEFAULT_TRANSFORMER
)
gpr.learn()

We can see that the scaling improves the R2 quality (recall: the higher, the better):



In [None]:
r2 = R2Measure(gpr)
r2.compute_test_measure(dataset_test)

We note that in this case, the input scaling does not contribute to this improvement:



In [None]:
gpr = GaussianProcessRegressor(dataset_train, transformer={"outputs": "MinMaxScaler"})
gpr.learn()
r2 = R2Measure(gpr)
r2.compute_test_measure(dataset_test)

We can also see that using a :class:`.StandardScaler` is less relevant in this case:



In [None]:
gpr = GaussianProcessRegressor(dataset_train, transformer={"outputs": "StandardScaler"})
gpr.learn()
r2 = R2Measure(gpr)
r2.compute_test_measure(dataset_test)

Finally,
we rewrite the Rosenbrock function as $f(x)=(1-x_1)^2+100(0.01x_2-x_1^2)^2$
and its domain as $[-2,2]\times[-200,200]$:



In [None]:
design_space = DesignSpace()
design_space.add_variable("x1", l_b=-2, u_b=2)
design_space.add_variable("x2", l_b=-200, u_b=200)

in order to have inputs with different orders of magnitude.
We create the learning and test datasets in the same way:



In [None]:
problem = OptimizationProblem(design_space)
problem.objective = MDOFunction(
    lambda x: (1 - x[0]) ** 2 + 100 * (0.01 * x[1] - x[0] ** 2) ** 2, "f"
)
openturns.execute(problem, openturns.OT_LHSO, n_samples=30)
dataset_train = problem.to_dataset(opt_naming=False)
openturns.execute(problem, openturns.OT_FULLFACT, n_samples=30 * 30)
dataset_test = problem.to_dataset(opt_naming=False)

and build a first Gaussian process regressor with a min-max scaler for the outputs:



In [None]:
gpr = GaussianProcessRegressor(dataset_train, transformer={"outputs": "MinMaxScaler"})
gpr.learn()
r2 = R2Measure(gpr)
r2.compute_test_measure(dataset_test)

The R2 quality is degraded
because estimating the model's correlation lengths is complicated.
This can be facilitated by setting a :class:`.MinMaxScaler` for the inputs:



In [None]:
gpr = GaussianProcessRegressor(
    dataset_train, transformer={"inputs": "MinMaxScaler", "outputs": "MinMaxScaler"}
)
gpr.learn()
r2 = R2Measure(gpr)
r2.compute_test_measure(dataset_test)