.. DO NOT EDIT.
.. THIS FILE WAS AUTOMATICALLY GENERATED BY SPHINX-GALLERY.
.. TO MAKE CHANGES, EDIT THE SOURCE PYTHON FILE:
.. "examples/mlearning/quality_measure/plot_r2.py"
.. LINE NUMBERS ARE GIVEN BELOW.

.. only:: html

    .. note::
        :class: sphx-glr-download-link-note

        :ref:`Go to the end <sphx_glr_download_examples_mlearning_quality_measure_plot_r2.py>`
        to download the full example code

.. rst-class:: sphx-glr-example-title

.. _sphx_glr_examples_mlearning_quality_measure_plot_r2.py:


R2 for regression models
========================

.. GENERATED FROM PYTHON SOURCE LINES 20-32

.. code-block:: Python


    from matplotlib import pyplot as plt
    from numpy import array
    from numpy import linspace
    from numpy import newaxis
    from numpy import sin

    from gemseo.datasets.io_dataset import IODataset
    from gemseo.mlearning.quality_measures.r2_measure import R2Measure
    from gemseo.mlearning.regression.polyreg import PolynomialRegressor
    from gemseo.mlearning.regression.rbf import RBFRegressor


.. GENERATED FROM PYTHON SOURCE LINES 33-51

Given a dataset :math:`(x_i,y_i,\hat{y}_i)_{1\leq i \leq N}`
where :math:`x_i` is an input point,
:math:`y_i` is an output observation
and :math:`\hat{y}_i=\hat{f}(x_i)` is an output prediction
computed by a regression model :math:`\hat{f}`,
the :math:`R^2` metric (also known as :math:`Q^2`) is written

.. math::

  R^2 = 1 - \frac{\sum_{i=1}^N(y_i-\hat{y}_i)^2}{\sum_{i=1}^N(y_i-\bar{y})^2} \leq 1

where :math:`\bar{y}=\frac{1}{N}\sum_{i=1}^Ny_i`.
The higher, the better.
From 0.9 it starts to look (very) good.
A negative value is very bad; a constant model would do better.

To illustrate this quality measure,
let us consider the function :math:`f(x)=(6x-2)^2\sin(12x-4)` :cite:`forrester2008`:

.. GENERATED FROM PYTHON SOURCE LINES 51-57

.. code-block:: Python


    def f(x):
        return (6 * x - 2) ** 2 * sin(12 * x - 4)


.. GENERATED FROM PYTHON SOURCE LINES 58-62

and try to approximate it with a polynomial of order 3.

For this,
we can take these 7 learning input points

.. GENERATED FROM PYTHON SOURCE LINES 62-64

.. code-block:: Python

    x_train = array([0.1, 0.3, 0.5, 0.6, 0.8, 0.9, 0.95])


.. GENERATED FROM PYTHON SOURCE LINES 65-66

and evaluate the model ``f`` over this design of experiments (DOE):

.. GENERATED FROM PYTHON SOURCE LINES 66-68

.. code-block:: Python

    y_train = f(x_train)


.. GENERATED FROM PYTHON SOURCE LINES 69-71

Then,
we create an :class:`.IODataset` from these 7 learning samples:

.. GENERATED FROM PYTHON SOURCE LINES 71-75

.. code-block:: Python

    dataset_train = IODataset()
    dataset_train.add_input_group(x_train[:, newaxis], ["x"])
    dataset_train.add_output_group(y_train[:, newaxis], ["y"])


.. GENERATED FROM PYTHON SOURCE LINES 76-77

and build a :class:`.PolynomialRegressor` with ``degree=3`` from it:

.. GENERATED FROM PYTHON SOURCE LINES 77-80

.. code-block:: Python

    polynomial = PolynomialRegressor(dataset_train, 3)
    polynomial.learn()


.. GENERATED FROM PYTHON SOURCE LINES 81-83

Before using it,
we are going to measure its quality with the :math:`R^2` metric:

.. GENERATED FROM PYTHON SOURCE LINES 83-86

.. code-block:: Python

    r2 = R2Measure(polynomial)
    r2.compute_learning_measure()


.. rst-class:: sphx-glr-script-out

 .. code-block:: none


    array([0.78649338])


.. GENERATED FROM PYTHON SOURCE LINES 87-91

This result is medium, and we can be expected to a poor generalization quality.
As the cost of this academic function is zero,
we can approximate this generalization quality with a large test dataset
whereas the usual test size is about 20% of the training size.

.. GENERATED FROM PYTHON SOURCE LINES 91-98

.. code-block:: Python

    x_test = linspace(0.0, 1.0, 100)
    y_test = f(x_test)
    dataset_test = IODataset()
    dataset_test.add_input_group(x_test[:, newaxis], ["x"])
    dataset_test.add_output_group(y_test[:, newaxis], ["y"])
    r2.compute_test_measure(dataset_test)


.. rst-class:: sphx-glr-script-out

 .. code-block:: none


    array([0.47280012])


.. GENERATED FROM PYTHON SOURCE LINES 99-102

The quality is lower than 0.5, which is pretty mediocre.
This can be explained by a broader generalization domain
than that of learning, which highlights the difficulties of extrapolation:

.. GENERATED FROM PYTHON SOURCE LINES 102-110

.. code-block:: Python

    plt.plot(x_test, y_test, "-b", label="Reference")
    plt.plot(x_train, y_train, "ob")
    plt.plot(x_test, polynomial.predict(x_test[:, newaxis]), "-r", label="Prediction")
    plt.plot(x_train, polynomial.predict(x_train[:, newaxis]), "or")
    plt.legend()
    plt.grid()
    plt.show()


.. image-sg:: /examples/mlearning/quality_measure/images/sphx_glr_plot_r2_001.png
   :alt: plot r2
   :srcset: /examples/mlearning/quality_measure/images/sphx_glr_plot_r2_001.png
   :class: sphx-glr-single-img


.. GENERATED FROM PYTHON SOURCE LINES 111-112

Using the learning domain would slightly improve the quality:

.. GENERATED FROM PYTHON SOURCE LINES 112-119

.. code-block:: Python

    x_test = linspace(x_train.min(), x_train.max(), 100)
    y_test = f(x_test)
    dataset_test_in_learning_domain = IODataset()
    dataset_test_in_learning_domain.add_input_group(x_test[:, newaxis], ["x"])
    dataset_test_in_learning_domain.add_output_group(y_test[:, newaxis], ["y"])
    r2.compute_test_measure(dataset_test_in_learning_domain)


.. rst-class:: sphx-glr-script-out

 .. code-block:: none


    array([0.50185268])


.. GENERATED FROM PYTHON SOURCE LINES 120-123

Lastly,
to get better results without new learning points,
we would have to change the regression model:

.. GENERATED FROM PYTHON SOURCE LINES 123-126

.. code-block:: Python

    rbf = RBFRegressor(dataset_train)
    rbf.learn()


.. GENERATED FROM PYTHON SOURCE LINES 127-129

The quality of this :class:`.RBFRegressor` is quite good,
both on the learning side:

.. GENERATED FROM PYTHON SOURCE LINES 129-132

.. code-block:: Python

    r2_rbf = R2Measure(rbf)
    r2_rbf.compute_learning_measure()


.. rst-class:: sphx-glr-script-out

 .. code-block:: none


    array([1.])


.. GENERATED FROM PYTHON SOURCE LINES 133-134

and on the validation side:

.. GENERATED FROM PYTHON SOURCE LINES 134-136

.. code-block:: Python

    r2_rbf.compute_test_measure(dataset_test_in_learning_domain)


.. rst-class:: sphx-glr-script-out

 .. code-block:: none


    array([0.99807284])


.. GENERATED FROM PYTHON SOURCE LINES 137-138

including the larger domain:

.. GENERATED FROM PYTHON SOURCE LINES 138-140

.. code-block:: Python

    r2_rbf.compute_test_measure(dataset_test)


.. rst-class:: sphx-glr-script-out

 .. code-block:: none


    array([0.98593573])


.. GENERATED FROM PYTHON SOURCE LINES 141-142

A final plot to convince us:

.. GENERATED FROM PYTHON SOURCE LINES 142-149

.. code-block:: Python

    plt.plot(x_test, y_test, "-b", label="Reference")
    plt.plot(x_train, y_train, "ob")
    plt.plot(x_test, rbf.predict(x_test[:, newaxis]), "-r", label="Prediction")
    plt.plot(x_train, rbf.predict(x_train[:, newaxis]), "or")
    plt.legend()
    plt.grid()
    plt.show()


.. image-sg:: /examples/mlearning/quality_measure/images/sphx_glr_plot_r2_002.png
   :alt: plot r2
   :srcset: /examples/mlearning/quality_measure/images/sphx_glr_plot_r2_002.png
   :class: sphx-glr-single-img


.. rst-class:: sphx-glr-timing

   **Total running time of the script:** (0 minutes 0.315 seconds)


.. _sphx_glr_download_examples_mlearning_quality_measure_plot_r2.py:

.. only:: html

  .. container:: sphx-glr-footer sphx-glr-footer-example

    .. container:: sphx-glr-download sphx-glr-download-jupyter

      :download:`Download Jupyter notebook: plot_r2.ipynb <plot_r2.ipynb>`

    .. container:: sphx-glr-download sphx-glr-download-python

      :download:`Download Python source code: plot_r2.py <plot_r2.py>`


.. only:: html

 .. rst-class:: sphx-glr-signature

    `Gallery generated by Sphinx-Gallery <https://sphinx-gallery.github.io>`_