.. DO NOT EDIT.
.. THIS FILE WAS AUTOMATICALLY GENERATED BY SPHINX-GALLERY.
.. TO MAKE CHANGES, EDIT THE SOURCE PYTHON FILE:
.. "examples/mlearning/quality_measure/plot_rmse.py"
.. LINE NUMBERS ARE GIVEN BELOW.

.. only:: html

    .. note::
        :class: sphx-glr-download-link-note

        :ref:`Go to the end <sphx_glr_download_examples_mlearning_quality_measure_plot_rmse.py>`
        to download the full example code

.. rst-class:: sphx-glr-example-title

.. _sphx_glr_examples_mlearning_quality_measure_plot_rmse.py:


RMSE for regression models
==========================

.. GENERATED FROM PYTHON SOURCE LINES 20-32

.. code-block:: Python


    from matplotlib import pyplot as plt
    from numpy import array
    from numpy import linspace
    from numpy import newaxis
    from numpy import sin

    from gemseo.datasets.io_dataset import IODataset
    from gemseo.mlearning.quality_measures.rmse_measure import RMSEMeasure
    from gemseo.mlearning.regression.polyreg import PolynomialRegressor
    from gemseo.mlearning.regression.rbf import RBFRegressor


.. GENERATED FROM PYTHON SOURCE LINES 33-50

Given a dataset :math:`(x_i,y_i,\hat{y}_i)_{1\leq i \leq N}`
where :math:`x_i` is an input point,
:math:`y_i` is an output observation
and :math:`\hat{y}_i=\hat{f}(x_i)` is an output prediction
computed by a regression model :math:`\hat{f}`,
the root mean squared error (RMSE) metric is written

.. math::

  \text{RMSE} = \sqrt{\frac{1}{N}\sum_{i=1}^N(y_i-\hat{y}_i)^2} \geq 0.

The lower, the better.
From a quantitative point of view,
this depends on the order of magnitude of the outputs.

To illustrate this quality measure,
let us consider the function :math:`f(x)=(6x-2)^2\sin(12x-4)` :cite:`forrester2008`:

.. GENERATED FROM PYTHON SOURCE LINES 50-56

.. code-block:: Python


    def f(x):
        return (6 * x - 2) ** 2 * sin(12 * x - 4)


.. GENERATED FROM PYTHON SOURCE LINES 57-61

and try to approximate it with a polynomial of order 3.

For this,
we can take these 7 learning input points

.. GENERATED FROM PYTHON SOURCE LINES 61-63

.. code-block:: Python

    x_train = array([0.1, 0.3, 0.5, 0.6, 0.8, 0.9, 0.95])


.. GENERATED FROM PYTHON SOURCE LINES 64-65

and evaluate the model ``f`` over this design of experiments (DOE):

.. GENERATED FROM PYTHON SOURCE LINES 65-67

.. code-block:: Python

    y_train = f(x_train)


.. GENERATED FROM PYTHON SOURCE LINES 68-70

Then,
we create an :class:`.IODataset` from these 7 learning samples:

.. GENERATED FROM PYTHON SOURCE LINES 70-74

.. code-block:: Python

    dataset_train = IODataset()
    dataset_train.add_input_group(x_train[:, newaxis], ["x"])
    dataset_train.add_output_group(y_train[:, newaxis], ["y"])


.. GENERATED FROM PYTHON SOURCE LINES 75-76

and build a :class:`.PolynomialRegressor` with ``degree=3`` from it:

.. GENERATED FROM PYTHON SOURCE LINES 76-79

.. code-block:: Python

    polynomial = PolynomialRegressor(dataset_train, 3)
    polynomial.learn()


.. GENERATED FROM PYTHON SOURCE LINES 80-82

Before using it,
we are going to measure its quality with the RMSE metric:

.. GENERATED FROM PYTHON SOURCE LINES 82-86

.. code-block:: Python

    rmse = RMSEMeasure(polynomial)
    result = rmse.compute_learning_measure()
    result, result / (y_train.max() - y_train.min())


.. rst-class:: sphx-glr-script-out

 .. code-block:: none


    (array([2.37578236]), array([0.137707]))


.. GENERATED FROM PYTHON SOURCE LINES 87-92

This result is medium (14% of the learning output range),
and we can be expected to a poor generalization quality.
As the cost of this academic function is zero,
we can approximate this generalization quality with a large test dataset
whereas the usual test size is about 20% of the training size.

.. GENERATED FROM PYTHON SOURCE LINES 92-100

.. code-block:: Python

    x_test = linspace(0.0, 1.0, 100)
    y_test = f(x_test)
    dataset_test = IODataset()
    dataset_test.add_input_group(x_test[:, newaxis], ["x"])
    dataset_test.add_output_group(y_test[:, newaxis], ["y"])
    result = rmse.compute_test_measure(dataset_test)
    result, result / (y_test.max() - y_test.min())


.. rst-class:: sphx-glr-script-out

 .. code-block:: none


    (array([3.31730517]), array([0.15181886]))


.. GENERATED FROM PYTHON SOURCE LINES 101-104

The quality is higher than 15% of the test output range, which is pretty mediocre.
This can be explained by a broader generalization domain
than that of learning, which highlights the difficulties of extrapolation:

.. GENERATED FROM PYTHON SOURCE LINES 104-112

.. code-block:: Python

    plt.plot(x_test, y_test, "-b", label="Reference")
    plt.plot(x_train, y_train, "ob")
    plt.plot(x_test, polynomial.predict(x_test[:, newaxis]), "-r", label="Prediction")
    plt.plot(x_train, polynomial.predict(x_train[:, newaxis]), "or")
    plt.legend()
    plt.grid()
    plt.show()


.. image-sg:: /examples/mlearning/quality_measure/images/sphx_glr_plot_rmse_001.png
   :alt: plot rmse
   :srcset: /examples/mlearning/quality_measure/images/sphx_glr_plot_rmse_001.png
   :class: sphx-glr-single-img


.. GENERATED FROM PYTHON SOURCE LINES 113-114

Using the learning domain would slightly improve the quality:

.. GENERATED FROM PYTHON SOURCE LINES 114-123

.. code-block:: Python

    x_test = linspace(x_train.min(), x_train.max(), 100)
    y_test_in_large_domain = y_test
    y_test = f(x_test)
    dataset_test_in_learning_domain = IODataset()
    dataset_test_in_learning_domain.add_input_group(x_test[:, newaxis], ["x"])
    dataset_test_in_learning_domain.add_output_group(y_test[:, newaxis], ["y"])
    result = rmse.compute_test_measure(dataset_test_in_learning_domain)
    result, result / (y_test.max() - y_test.min())


.. rst-class:: sphx-glr-script-out

 .. code-block:: none


    (array([2.39937613]), array([0.13099891]))


.. GENERATED FROM PYTHON SOURCE LINES 124-127

Lastly,
to get better results without new learning points,
we would have to change the regression model:

.. GENERATED FROM PYTHON SOURCE LINES 127-130

.. code-block:: Python

    rbf = RBFRegressor(dataset_train)
    rbf.learn()


.. GENERATED FROM PYTHON SOURCE LINES 131-133

The quality of this :class:`.RBFRegressor` is quite good,
both on the learning side:

.. GENERATED FROM PYTHON SOURCE LINES 133-137

.. code-block:: Python

    rmse_rbf = RMSEMeasure(rbf)
    result = rmse_rbf.compute_learning_measure()
    result, result / (y_train.max() - y_train.min())


.. rst-class:: sphx-glr-script-out

 .. code-block:: none


    (array([1.22756756e-14]), array([7.11532547e-16]))


.. GENERATED FROM PYTHON SOURCE LINES 138-139

and on the validation side:

.. GENERATED FROM PYTHON SOURCE LINES 139-142

.. code-block:: Python

    result = rmse_rbf.compute_test_measure(dataset_test_in_learning_domain)
    result, result / (y_test.max() - y_test.min())


.. rst-class:: sphx-glr-script-out

 .. code-block:: none


    (array([0.14923751]), array([0.00814793]))


.. GENERATED FROM PYTHON SOURCE LINES 143-144

including the larger domain:

.. GENERATED FROM PYTHON SOURCE LINES 144-147

.. code-block:: Python

    result = rmse_rbf.compute_test_measure(dataset_test)
    result, result / (y_test_in_large_domain.max() - y_test_in_large_domain.min())


.. rst-class:: sphx-glr-script-out

 .. code-block:: none


    (array([0.54182176]), array([0.02479686]))


.. GENERATED FROM PYTHON SOURCE LINES 148-149

A final plot to convince us:

.. GENERATED FROM PYTHON SOURCE LINES 149-156

.. code-block:: Python

    plt.plot(x_test, y_test, "-b", label="Reference")
    plt.plot(x_train, y_train, "ob")
    plt.plot(x_test, rbf.predict(x_test[:, newaxis]), "-r", label="Prediction")
    plt.plot(x_train, rbf.predict(x_train[:, newaxis]), "or")
    plt.legend()
    plt.grid()
    plt.show()


.. image-sg:: /examples/mlearning/quality_measure/images/sphx_glr_plot_rmse_002.png
   :alt: plot rmse
   :srcset: /examples/mlearning/quality_measure/images/sphx_glr_plot_rmse_002.png
   :class: sphx-glr-single-img


.. rst-class:: sphx-glr-timing

   **Total running time of the script:** (0 minutes 0.315 seconds)


.. _sphx_glr_download_examples_mlearning_quality_measure_plot_rmse.py:

.. only:: html

  .. container:: sphx-glr-footer sphx-glr-footer-example

    .. container:: sphx-glr-download sphx-glr-download-jupyter

      :download:`Download Jupyter notebook: plot_rmse.ipynb <plot_rmse.ipynb>`

    .. container:: sphx-glr-download sphx-glr-download-python

      :download:`Download Python source code: plot_rmse.py <plot_rmse.py>`


.. only:: html

 .. rst-class:: sphx-glr-signature

    `Gallery generated by Sphinx-Gallery <https://sphinx-gallery.github.io>`_