.. DO NOT EDIT. .. THIS FILE WAS AUTOMATICALLY GENERATED BY SPHINX-GALLERY. .. TO MAKE CHANGES, EDIT THE SOURCE PYTHON FILE: .. "examples/mlearning/quality_measure/plot_rmse.py" .. LINE NUMBERS ARE GIVEN BELOW. .. only:: html .. note:: :class: sphx-glr-download-link-note :ref:`Go to the end ` to download the full example code .. rst-class:: sphx-glr-example-title .. _sphx_glr_examples_mlearning_quality_measure_plot_rmse.py: RMSE for regression models ========================== .. GENERATED FROM PYTHON SOURCE LINES 20-32 .. code-block:: Python from matplotlib import pyplot as plt from numpy import array from numpy import linspace from numpy import newaxis from numpy import sin from gemseo.datasets.io_dataset import IODataset from gemseo.mlearning.quality_measures.rmse_measure import RMSEMeasure from gemseo.mlearning.regression.polyreg import PolynomialRegressor from gemseo.mlearning.regression.rbf import RBFRegressor .. GENERATED FROM PYTHON SOURCE LINES 33-50 Given a dataset :math:`(x_i,y_i,\hat{y}_i)_{1\leq i \leq N}` where :math:`x_i` is an input point, :math:`y_i` is an output observation and :math:`\hat{y}_i=\hat{f}(x_i)` is an output prediction computed by a regression model :math:`\hat{f}`, the root mean squared error (RMSE) metric is written .. math:: \text{RMSE} = \sqrt{\frac{1}{N}\sum_{i=1}^N(y_i-\hat{y}_i)^2} \geq 0. The lower, the better. From a quantitative point of view, this depends on the order of magnitude of the outputs. To illustrate this quality measure, let us consider the function :math:`f(x)=(6x-2)^2\sin(12x-4)` :cite:`forrester2008`: .. GENERATED FROM PYTHON SOURCE LINES 50-56 .. code-block:: Python def f(x): return (6 * x - 2) ** 2 * sin(12 * x - 4) .. GENERATED FROM PYTHON SOURCE LINES 57-61 and try to approximate it with a polynomial of order 3. For this, we can take these 7 learning input points .. GENERATED FROM PYTHON SOURCE LINES 61-63 .. code-block:: Python x_train = array([0.1, 0.3, 0.5, 0.6, 0.8, 0.9, 0.95]) .. GENERATED FROM PYTHON SOURCE LINES 64-65 and evaluate the model ``f`` over this design of experiments (DOE): .. GENERATED FROM PYTHON SOURCE LINES 65-67 .. code-block:: Python y_train = f(x_train) .. GENERATED FROM PYTHON SOURCE LINES 68-70 Then, we create an :class:`.IODataset` from these 7 learning samples: .. GENERATED FROM PYTHON SOURCE LINES 70-74 .. code-block:: Python dataset_train = IODataset() dataset_train.add_input_group(x_train[:, newaxis], ["x"]) dataset_train.add_output_group(y_train[:, newaxis], ["y"]) .. GENERATED FROM PYTHON SOURCE LINES 75-76 and build a :class:`.PolynomialRegressor` with ``degree=3`` from it: .. GENERATED FROM PYTHON SOURCE LINES 76-79 .. code-block:: Python polynomial = PolynomialRegressor(dataset_train, 3) polynomial.learn() .. GENERATED FROM PYTHON SOURCE LINES 80-82 Before using it, we are going to measure its quality with the RMSE metric: .. GENERATED FROM PYTHON SOURCE LINES 82-86 .. code-block:: Python rmse = RMSEMeasure(polynomial) result = rmse.compute_learning_measure() result, result / (y_train.max() - y_train.min()) .. rst-class:: sphx-glr-script-out .. code-block:: none (array([2.37578236]), array([0.137707])) .. GENERATED FROM PYTHON SOURCE LINES 87-92 This result is medium (14% of the learning output range), and we can be expected to a poor generalization quality. As the cost of this academic function is zero, we can approximate this generalization quality with a large test dataset whereas the usual test size is about 20% of the training size. .. GENERATED FROM PYTHON SOURCE LINES 92-100 .. code-block:: Python x_test = linspace(0.0, 1.0, 100) y_test = f(x_test) dataset_test = IODataset() dataset_test.add_input_group(x_test[:, newaxis], ["x"]) dataset_test.add_output_group(y_test[:, newaxis], ["y"]) result = rmse.compute_test_measure(dataset_test) result, result / (y_test.max() - y_test.min()) .. rst-class:: sphx-glr-script-out .. code-block:: none (array([3.31730517]), array([0.15181886])) .. GENERATED FROM PYTHON SOURCE LINES 101-104 The quality is higher than 15% of the test output range, which is pretty mediocre. This can be explained by a broader generalization domain than that of learning, which highlights the difficulties of extrapolation: .. GENERATED FROM PYTHON SOURCE LINES 104-112 .. code-block:: Python plt.plot(x_test, y_test, "-b", label="Reference") plt.plot(x_train, y_train, "ob") plt.plot(x_test, polynomial.predict(x_test[:, newaxis]), "-r", label="Prediction") plt.plot(x_train, polynomial.predict(x_train[:, newaxis]), "or") plt.legend() plt.grid() plt.show() .. image-sg:: /examples/mlearning/quality_measure/images/sphx_glr_plot_rmse_001.png :alt: plot rmse :srcset: /examples/mlearning/quality_measure/images/sphx_glr_plot_rmse_001.png :class: sphx-glr-single-img .. GENERATED FROM PYTHON SOURCE LINES 113-114 Using the learning domain would slightly improve the quality: .. GENERATED FROM PYTHON SOURCE LINES 114-123 .. code-block:: Python x_test = linspace(x_train.min(), x_train.max(), 100) y_test_in_large_domain = y_test y_test = f(x_test) dataset_test_in_learning_domain = IODataset() dataset_test_in_learning_domain.add_input_group(x_test[:, newaxis], ["x"]) dataset_test_in_learning_domain.add_output_group(y_test[:, newaxis], ["y"]) result = rmse.compute_test_measure(dataset_test_in_learning_domain) result, result / (y_test.max() - y_test.min()) .. rst-class:: sphx-glr-script-out .. code-block:: none (array([2.39937613]), array([0.13099891])) .. GENERATED FROM PYTHON SOURCE LINES 124-127 Lastly, to get better results without new learning points, we would have to change the regression model: .. GENERATED FROM PYTHON SOURCE LINES 127-130 .. code-block:: Python rbf = RBFRegressor(dataset_train) rbf.learn() .. GENERATED FROM PYTHON SOURCE LINES 131-133 The quality of this :class:`.RBFRegressor` is quite good, both on the learning side: .. GENERATED FROM PYTHON SOURCE LINES 133-137 .. code-block:: Python rmse_rbf = RMSEMeasure(rbf) result = rmse_rbf.compute_learning_measure() result, result / (y_train.max() - y_train.min()) .. rst-class:: sphx-glr-script-out .. code-block:: none (array([1.22756756e-14]), array([7.11532547e-16])) .. GENERATED FROM PYTHON SOURCE LINES 138-139 and on the validation side: .. GENERATED FROM PYTHON SOURCE LINES 139-142 .. code-block:: Python result = rmse_rbf.compute_test_measure(dataset_test_in_learning_domain) result, result / (y_test.max() - y_test.min()) .. rst-class:: sphx-glr-script-out .. code-block:: none (array([0.14923751]), array([0.00814793])) .. GENERATED FROM PYTHON SOURCE LINES 143-144 including the larger domain: .. GENERATED FROM PYTHON SOURCE LINES 144-147 .. code-block:: Python result = rmse_rbf.compute_test_measure(dataset_test) result, result / (y_test_in_large_domain.max() - y_test_in_large_domain.min()) .. rst-class:: sphx-glr-script-out .. code-block:: none (array([0.54182176]), array([0.02479686])) .. GENERATED FROM PYTHON SOURCE LINES 148-149 A final plot to convince us: .. GENERATED FROM PYTHON SOURCE LINES 149-156 .. code-block:: Python plt.plot(x_test, y_test, "-b", label="Reference") plt.plot(x_train, y_train, "ob") plt.plot(x_test, rbf.predict(x_test[:, newaxis]), "-r", label="Prediction") plt.plot(x_train, rbf.predict(x_train[:, newaxis]), "or") plt.legend() plt.grid() plt.show() .. image-sg:: /examples/mlearning/quality_measure/images/sphx_glr_plot_rmse_002.png :alt: plot rmse :srcset: /examples/mlearning/quality_measure/images/sphx_glr_plot_rmse_002.png :class: sphx-glr-single-img .. rst-class:: sphx-glr-timing **Total running time of the script:** (0 minutes 0.315 seconds) .. _sphx_glr_download_examples_mlearning_quality_measure_plot_rmse.py: .. only:: html .. container:: sphx-glr-footer sphx-glr-footer-example .. container:: sphx-glr-download sphx-glr-download-jupyter :download:`Download Jupyter notebook: plot_rmse.ipynb ` .. container:: sphx-glr-download sphx-glr-download-python :download:`Download Python source code: plot_rmse.py ` .. only:: html .. rst-class:: sphx-glr-signature `Gallery generated by Sphinx-Gallery `_