.. DO NOT EDIT. .. THIS FILE WAS AUTOMATICALLY GENERATED BY SPHINX-GALLERY. .. TO MAKE CHANGES, EDIT THE SOURCE PYTHON FILE: .. "examples/mlearning/quality_measure/plot_r2.py" .. LINE NUMBERS ARE GIVEN BELOW. .. only:: html .. note:: :class: sphx-glr-download-link-note :ref:`Go to the end ` to download the full example code .. rst-class:: sphx-glr-example-title .. _sphx_glr_examples_mlearning_quality_measure_plot_r2.py: R2 for regression models ======================== .. GENERATED FROM PYTHON SOURCE LINES 20-32 .. code-block:: Python from matplotlib import pyplot as plt from numpy import array from numpy import linspace from numpy import newaxis from numpy import sin from gemseo.datasets.io_dataset import IODataset from gemseo.mlearning.quality_measures.r2_measure import R2Measure from gemseo.mlearning.regression.polyreg import PolynomialRegressor from gemseo.mlearning.regression.rbf import RBFRegressor .. GENERATED FROM PYTHON SOURCE LINES 33-51 Given a dataset :math:`(x_i,y_i,\hat{y}_i)_{1\leq i \leq N}` where :math:`x_i` is an input point, :math:`y_i` is an output observation and :math:`\hat{y}_i=\hat{f}(x_i)` is an output prediction computed by a regression model :math:`\hat{f}`, the :math:`R^2` metric (also known as :math:`Q^2`) is written .. math:: R^2 = 1 - \frac{\sum_{i=1}^N(y_i-\hat{y}_i)^2}{\sum_{i=1}^N(y_i-\bar{y})^2} \leq 1 where :math:`\bar{y}=\frac{1}{N}\sum_{i=1}^Ny_i`. The higher, the better. From 0.9 it starts to look (very) good. A negative value is very bad; a constant model would do better. To illustrate this quality measure, let us consider the function :math:`f(x)=(6x-2)^2\sin(12x-4)` :cite:`forrester2008`: .. GENERATED FROM PYTHON SOURCE LINES 51-57 .. code-block:: Python def f(x): return (6 * x - 2) ** 2 * sin(12 * x - 4) .. GENERATED FROM PYTHON SOURCE LINES 58-62 and try to approximate it with a polynomial of order 3. For this, we can take these 7 learning input points .. GENERATED FROM PYTHON SOURCE LINES 62-64 .. code-block:: Python x_train = array([0.1, 0.3, 0.5, 0.6, 0.8, 0.9, 0.95]) .. GENERATED FROM PYTHON SOURCE LINES 65-66 and evaluate the model ``f`` over this design of experiments (DOE): .. GENERATED FROM PYTHON SOURCE LINES 66-68 .. code-block:: Python y_train = f(x_train) .. GENERATED FROM PYTHON SOURCE LINES 69-71 Then, we create an :class:`.IODataset` from these 7 learning samples: .. GENERATED FROM PYTHON SOURCE LINES 71-75 .. code-block:: Python dataset_train = IODataset() dataset_train.add_input_group(x_train[:, newaxis], ["x"]) dataset_train.add_output_group(y_train[:, newaxis], ["y"]) .. GENERATED FROM PYTHON SOURCE LINES 76-77 and build a :class:`.PolynomialRegressor` with ``degree=3`` from it: .. GENERATED FROM PYTHON SOURCE LINES 77-80 .. code-block:: Python polynomial = PolynomialRegressor(dataset_train, 3) polynomial.learn() .. GENERATED FROM PYTHON SOURCE LINES 81-83 Before using it, we are going to measure its quality with the :math:`R^2` metric: .. GENERATED FROM PYTHON SOURCE LINES 83-86 .. code-block:: Python r2 = R2Measure(polynomial) r2.compute_learning_measure() .. rst-class:: sphx-glr-script-out .. code-block:: none array([0.78649338]) .. GENERATED FROM PYTHON SOURCE LINES 87-91 This result is medium, and we can be expected to a poor generalization quality. As the cost of this academic function is zero, we can approximate this generalization quality with a large test dataset whereas the usual test size is about 20% of the training size. .. GENERATED FROM PYTHON SOURCE LINES 91-98 .. code-block:: Python x_test = linspace(0.0, 1.0, 100) y_test = f(x_test) dataset_test = IODataset() dataset_test.add_input_group(x_test[:, newaxis], ["x"]) dataset_test.add_output_group(y_test[:, newaxis], ["y"]) r2.compute_test_measure(dataset_test) .. rst-class:: sphx-glr-script-out .. code-block:: none array([0.47280012]) .. GENERATED FROM PYTHON SOURCE LINES 99-102 The quality is lower than 0.5, which is pretty mediocre. This can be explained by a broader generalization domain than that of learning, which highlights the difficulties of extrapolation: .. GENERATED FROM PYTHON SOURCE LINES 102-110 .. code-block:: Python plt.plot(x_test, y_test, "-b", label="Reference") plt.plot(x_train, y_train, "ob") plt.plot(x_test, polynomial.predict(x_test[:, newaxis]), "-r", label="Prediction") plt.plot(x_train, polynomial.predict(x_train[:, newaxis]), "or") plt.legend() plt.grid() plt.show() .. image-sg:: /examples/mlearning/quality_measure/images/sphx_glr_plot_r2_001.png :alt: plot r2 :srcset: /examples/mlearning/quality_measure/images/sphx_glr_plot_r2_001.png :class: sphx-glr-single-img .. GENERATED FROM PYTHON SOURCE LINES 111-112 Using the learning domain would slightly improve the quality: .. GENERATED FROM PYTHON SOURCE LINES 112-119 .. code-block:: Python x_test = linspace(x_train.min(), x_train.max(), 100) y_test = f(x_test) dataset_test_in_learning_domain = IODataset() dataset_test_in_learning_domain.add_input_group(x_test[:, newaxis], ["x"]) dataset_test_in_learning_domain.add_output_group(y_test[:, newaxis], ["y"]) r2.compute_test_measure(dataset_test_in_learning_domain) .. rst-class:: sphx-glr-script-out .. code-block:: none array([0.50185268]) .. GENERATED FROM PYTHON SOURCE LINES 120-123 Lastly, to get better results without new learning points, we would have to change the regression model: .. GENERATED FROM PYTHON SOURCE LINES 123-126 .. code-block:: Python rbf = RBFRegressor(dataset_train) rbf.learn() .. GENERATED FROM PYTHON SOURCE LINES 127-129 The quality of this :class:`.RBFRegressor` is quite good, both on the learning side: .. GENERATED FROM PYTHON SOURCE LINES 129-132 .. code-block:: Python r2_rbf = R2Measure(rbf) r2_rbf.compute_learning_measure() .. rst-class:: sphx-glr-script-out .. code-block:: none array([1.]) .. GENERATED FROM PYTHON SOURCE LINES 133-134 and on the validation side: .. GENERATED FROM PYTHON SOURCE LINES 134-136 .. code-block:: Python r2_rbf.compute_test_measure(dataset_test_in_learning_domain) .. rst-class:: sphx-glr-script-out .. code-block:: none array([0.99807284]) .. GENERATED FROM PYTHON SOURCE LINES 137-138 including the larger domain: .. GENERATED FROM PYTHON SOURCE LINES 138-140 .. code-block:: Python r2_rbf.compute_test_measure(dataset_test) .. rst-class:: sphx-glr-script-out .. code-block:: none array([0.98593573]) .. GENERATED FROM PYTHON SOURCE LINES 141-142 A final plot to convince us: .. GENERATED FROM PYTHON SOURCE LINES 142-149 .. code-block:: Python plt.plot(x_test, y_test, "-b", label="Reference") plt.plot(x_train, y_train, "ob") plt.plot(x_test, rbf.predict(x_test[:, newaxis]), "-r", label="Prediction") plt.plot(x_train, rbf.predict(x_train[:, newaxis]), "or") plt.legend() plt.grid() plt.show() .. image-sg:: /examples/mlearning/quality_measure/images/sphx_glr_plot_r2_002.png :alt: plot r2 :srcset: /examples/mlearning/quality_measure/images/sphx_glr_plot_r2_002.png :class: sphx-glr-single-img .. rst-class:: sphx-glr-timing **Total running time of the script:** (0 minutes 0.315 seconds) .. _sphx_glr_download_examples_mlearning_quality_measure_plot_r2.py: .. only:: html .. container:: sphx-glr-footer sphx-glr-footer-example .. container:: sphx-glr-download sphx-glr-download-jupyter :download:`Download Jupyter notebook: plot_r2.ipynb ` .. container:: sphx-glr-download sphx-glr-download-python :download:`Download Python source code: plot_r2.py ` .. only:: html .. rst-class:: sphx-glr-signature `Gallery generated by Sphinx-Gallery `_