.. DO NOT EDIT. .. THIS FILE WAS AUTOMATICALLY GENERATED BY SPHINX-GALLERY. .. TO MAKE CHANGES, EDIT THE SOURCE PYTHON FILE: .. "examples/mlearning/quality_measure/plot_mse.py" .. LINE NUMBERS ARE GIVEN BELOW. .. only:: html .. note:: :class: sphx-glr-download-link-note :ref:`Go to the end ` to download the full example code. .. rst-class:: sphx-glr-example-title .. _sphx_glr_examples_mlearning_quality_measure_plot_mse.py: MSE for regression models ========================= .. GENERATED FROM PYTHON SOURCE LINES 20-34 .. code-block:: Python from __future__ import annotations from matplotlib import pyplot as plt from numpy import array from numpy import linspace from numpy import newaxis from numpy import sin from gemseo.datasets.io_dataset import IODataset from gemseo.mlearning.regression.algos.polyreg import PolynomialRegressor from gemseo.mlearning.regression.algos.rbf import RBFRegressor from gemseo.mlearning.regression.quality.mse_measure import MSEMeasure .. GENERATED FROM PYTHON SOURCE LINES 35-54 Given a dataset :math:`(x_i,y_i,\hat{y}_i)_{1\leq i \leq N}` where :math:`x_i` is an input point, :math:`y_i` is an output observation and :math:`\hat{y}_i=\hat{f}(x_i)` is an output prediction computed by a regression model :math:`\hat{f}`, the mean squared error (MSE) metric is written .. math:: \text{MSE} = \frac{1}{N}\sum_{i=1}^N(y_i-\hat{y}_i)^2 \geq 0. The lower, the better. From a quantitative point of view, this depends on the order of magnitude of the outputs. The square root of this average is often easier to interpret, as it is expressed in the units of the output (see :class:`.RMSEMeasure`). To illustrate this quality measure, let us consider the function :math:`f(x)=(6x-2)^2\sin(12x-4)` :cite:`forrester2008`: .. GENERATED FROM PYTHON SOURCE LINES 54-60 .. code-block:: Python def f(x): return (6 * x - 2) ** 2 * sin(12 * x - 4) .. GENERATED FROM PYTHON SOURCE LINES 61-65 and try to approximate it with a polynomial of order 3. For this, we can take these 7 learning input points .. GENERATED FROM PYTHON SOURCE LINES 65-67 .. code-block:: Python x_train = array([0.1, 0.3, 0.5, 0.6, 0.8, 0.9, 0.95]) .. GENERATED FROM PYTHON SOURCE LINES 68-69 and evaluate the model ``f`` over this design of experiments (DOE): .. GENERATED FROM PYTHON SOURCE LINES 69-71 .. code-block:: Python y_train = f(x_train) .. GENERATED FROM PYTHON SOURCE LINES 72-74 Then, we create an :class:`.IODataset` from these 7 learning samples: .. GENERATED FROM PYTHON SOURCE LINES 74-78 .. code-block:: Python dataset_train = IODataset() dataset_train.add_input_group(x_train[:, newaxis], ["x"]) dataset_train.add_output_group(y_train[:, newaxis], ["y"]) .. GENERATED FROM PYTHON SOURCE LINES 79-80 and build a :class:`.PolynomialRegressor` with ``degree=3`` from it: .. GENERATED FROM PYTHON SOURCE LINES 80-83 .. code-block:: Python polynomial = PolynomialRegressor(dataset_train, degree=3) polynomial.learn() .. GENERATED FROM PYTHON SOURCE LINES 84-86 Before using it, we are going to measure its quality with the MSE metric: .. GENERATED FROM PYTHON SOURCE LINES 86-90 .. code-block:: Python mse = MSEMeasure(polynomial) result = mse.compute_learning_measure() result, result**0.5 / (y_train.max() - y_train.min()) .. rst-class:: sphx-glr-script-out .. code-block:: none (array([5.6443418]), array([0.137707])) .. GENERATED FROM PYTHON SOURCE LINES 91-96 This result is medium (14% of the learning output range), and we can be expected to a poor generalization quality. As the cost of this academic function is zero, we can approximate this generalization quality with a large test dataset whereas the usual test size is about 20% of the training size. .. GENERATED FROM PYTHON SOURCE LINES 96-104 .. code-block:: Python x_test = linspace(0.0, 1.0, 100) y_test = f(x_test) dataset_test = IODataset() dataset_test.add_input_group(x_test[:, newaxis], ["x"]) dataset_test.add_output_group(y_test[:, newaxis], ["y"]) result = mse.compute_test_measure(dataset_test) result, result**0.5 / (y_test.max() - y_test.min()) .. rst-class:: sphx-glr-script-out .. code-block:: none (array([11.00451361]), array([0.15181886])) .. GENERATED FROM PYTHON SOURCE LINES 105-108 The quality is higher than 15% of the test output range, which is pretty mediocre. This can be explained by a broader generalization domain than that of learning, which highlights the difficulties of extrapolation: .. GENERATED FROM PYTHON SOURCE LINES 108-116 .. code-block:: Python plt.plot(x_test, y_test, "-b", label="Reference") plt.plot(x_train, y_train, "ob") plt.plot(x_test, polynomial.predict(x_test[:, newaxis]), "-r", label="Prediction") plt.plot(x_train, polynomial.predict(x_train[:, newaxis]), "or") plt.legend() plt.grid() plt.show() .. image-sg:: /examples/mlearning/quality_measure/images/sphx_glr_plot_mse_001.png :alt: plot mse :srcset: /examples/mlearning/quality_measure/images/sphx_glr_plot_mse_001.png :class: sphx-glr-single-img .. GENERATED FROM PYTHON SOURCE LINES 117-118 Using the learning domain would slightly improve the quality: .. GENERATED FROM PYTHON SOURCE LINES 118-127 .. code-block:: Python x_test = linspace(x_train.min(), x_train.max(), 100) y_test_in_large_domain = y_test y_test = f(x_test) dataset_test_in_learning_domain = IODataset() dataset_test_in_learning_domain.add_input_group(x_test[:, newaxis], ["x"]) dataset_test_in_learning_domain.add_output_group(y_test[:, newaxis], ["y"]) mse.compute_test_measure(dataset_test_in_learning_domain) result, result**0.5 / (y_test.max() - y_test.min()) .. rst-class:: sphx-glr-script-out .. code-block:: none (array([11.00451361]), array([0.18111514])) .. GENERATED FROM PYTHON SOURCE LINES 128-131 Lastly, to get better results without new learning points, we would have to change the regression model: .. GENERATED FROM PYTHON SOURCE LINES 131-134 .. code-block:: Python rbf = RBFRegressor(dataset_train) rbf.learn() .. GENERATED FROM PYTHON SOURCE LINES 135-137 The quality of this :class:`.RBFRegressor` is quite good, both on the learning side: .. GENERATED FROM PYTHON SOURCE LINES 137-141 .. code-block:: Python mse_rbf = MSEMeasure(rbf) result = mse_rbf.compute_learning_measure() result, result**0.5 / (y_train.max() - y_train.min()) .. rst-class:: sphx-glr-script-out .. code-block:: none (array([1.50692212e-28]), array([7.11532547e-16])) .. GENERATED FROM PYTHON SOURCE LINES 142-143 and on the validation side: .. GENERATED FROM PYTHON SOURCE LINES 143-146 .. code-block:: Python result = mse_rbf.compute_test_measure(dataset_test_in_learning_domain) result, result**0.5 / (y_test.max() - y_test.min()) .. rst-class:: sphx-glr-script-out .. code-block:: none (array([0.02227183]), array([0.00814793])) .. GENERATED FROM PYTHON SOURCE LINES 147-148 including the larger domain: .. GENERATED FROM PYTHON SOURCE LINES 148-151 .. code-block:: Python result = mse_rbf.compute_test_measure(dataset_test) result, result**0.5 / (y_test_in_large_domain.max() - y_test_in_large_domain.min()) .. rst-class:: sphx-glr-script-out .. code-block:: none (array([0.29357082]), array([0.02479686])) .. GENERATED FROM PYTHON SOURCE LINES 152-153 A final plot to convince us: .. GENERATED FROM PYTHON SOURCE LINES 153-160 .. code-block:: Python plt.plot(x_test, y_test, "-b", label="Reference") plt.plot(x_train, y_train, "ob") plt.plot(x_test, rbf.predict(x_test[:, newaxis]), "-r", label="Prediction") plt.plot(x_train, rbf.predict(x_train[:, newaxis]), "or") plt.legend() plt.grid() plt.show() .. image-sg:: /examples/mlearning/quality_measure/images/sphx_glr_plot_mse_002.png :alt: plot mse :srcset: /examples/mlearning/quality_measure/images/sphx_glr_plot_mse_002.png :class: sphx-glr-single-img .. rst-class:: sphx-glr-timing **Total running time of the script:** (0 minutes 0.197 seconds) .. _sphx_glr_download_examples_mlearning_quality_measure_plot_mse.py: .. only:: html .. container:: sphx-glr-footer sphx-glr-footer-example .. container:: sphx-glr-download sphx-glr-download-jupyter :download:`Download Jupyter notebook: plot_mse.ipynb ` .. container:: sphx-glr-download sphx-glr-download-python :download:`Download Python source code: plot_mse.py ` .. container:: sphx-glr-download sphx-glr-download-zip :download:`Download zipped: plot_mse.zip ` .. only:: html .. rst-class:: sphx-glr-signature `Gallery generated by Sphinx-Gallery `_