.. DO NOT EDIT. .. THIS FILE WAS AUTOMATICALLY GENERATED BY SPHINX-GALLERY. .. TO MAKE CHANGES, EDIT THE SOURCE PYTHON FILE: .. "examples/mlearning/quality_measure/plot_leave_one_out.py" .. LINE NUMBERS ARE GIVEN BELOW. .. only:: html .. note:: :class: sphx-glr-download-link-note :ref:`Go to the end ` to download the full example code. .. rst-class:: sphx-glr-example-title .. _sphx_glr_examples_mlearning_quality_measure_plot_leave_one_out.py: Leave-one-out ============= .. GENERATED FROM PYTHON SOURCE LINES 20-33 .. code-block:: Python from __future__ import annotations from matplotlib import pyplot as plt from numpy import array from numpy import linspace from numpy import newaxis from numpy import sin from gemseo.datasets.io_dataset import IODataset from gemseo.mlearning.regression.algos.polyreg import PolynomialRegressor from gemseo.mlearning.regression.quality.rmse_measure import RMSEMeasure .. GENERATED FROM PYTHON SOURCE LINES 34-51 Every quality measure can be computed from a training dataset or a test dataset. The use of a test dataset aims to approximate the quality of the machine learning model over the whole variable space in order to be less dependent on the training dataset and so to avoid over-fitting (accurate near learning points and poor elsewhere). In the presence of expensive data, this test dataset may just be a dream, and we have to estimate this quality with techniques resampling the training dataset, such as leave-one-out. The idea is simple: we repeat iterate :math:`N` times the two-step task "1) learn from :math:`N-1` samples, 2) predict from the remainder" and finally approximate the measure from the :math:`N` predictions. To illustrate this point, let us consider the function :math:`f(x)=(6x-2)^2\sin(12x-4)` :cite:`forrester2008`: .. GENERATED FROM PYTHON SOURCE LINES 51-57 .. code-block:: Python def f(x): return (6 * x - 2) ** 2 * sin(12 * x - 4) .. GENERATED FROM PYTHON SOURCE LINES 58-62 and try to approximate it with a polynomial of order 3. For this, we can take these :math:`N=7` learning input points .. GENERATED FROM PYTHON SOURCE LINES 62-64 .. code-block:: Python x_train = array([0.1, 0.3, 0.5, 0.6, 0.8, 0.9, 0.95]) .. GENERATED FROM PYTHON SOURCE LINES 65-66 and evaluate the model ``f`` over this design of experiments (DOE): .. GENERATED FROM PYTHON SOURCE LINES 66-68 .. code-block:: Python y_train = f(x_train) .. GENERATED FROM PYTHON SOURCE LINES 69-71 Then, we create an :class:`.IODataset` from these 7 learning samples: .. GENERATED FROM PYTHON SOURCE LINES 71-75 .. code-block:: Python dataset_train = IODataset() dataset_train.add_input_group(x_train[:, newaxis], ["x"]) dataset_train.add_output_group(y_train[:, newaxis], ["y"]) .. GENERATED FROM PYTHON SOURCE LINES 76-77 and build a :class:`.PolynomialRegressor` with ``degree=3`` from it: .. GENERATED FROM PYTHON SOURCE LINES 77-80 .. code-block:: Python polynomial = PolynomialRegressor(dataset_train, degree=3) polynomial.learn() .. GENERATED FROM PYTHON SOURCE LINES 81-83 Finally, we compute the quality of this model with the RMSE metric: .. GENERATED FROM PYTHON SOURCE LINES 83-86 .. code-block:: Python rmse = RMSEMeasure(polynomial) rmse.compute_learning_measure() .. rst-class:: sphx-glr-script-out .. code-block:: none array([2.37578236]) .. GENERATED FROM PYTHON SOURCE LINES 87-90 As the cost of this academic function is zero, we can approximate the generalization quality with a large test dataset whereas the usual test size is about 20% of the training size. .. GENERATED FROM PYTHON SOURCE LINES 90-97 .. code-block:: Python x_test = linspace(0.0, 1.0, 100) y_test = f(x_test) dataset_test = IODataset() dataset_test.add_input_group(x_test[:, newaxis], ["x"]) dataset_test.add_output_group(y_test[:, newaxis], ["y"]) rmse.compute_test_measure(dataset_test) .. rst-class:: sphx-glr-script-out .. code-block:: none array([3.31730517]) .. GENERATED FROM PYTHON SOURCE LINES 98-99 And do the same by leave-one-validation .. GENERATED FROM PYTHON SOURCE LINES 99-101 .. code-block:: Python rmse.compute_leave_one_out_measure(store_resampling_result=True) .. rst-class:: sphx-glr-script-out .. code-block:: none array([14.76062648]) .. GENERATED FROM PYTHON SOURCE LINES 102-105 In this case, the leave-one-out error is very pessimistic. We can take a closer look by storing the :math:`N` sub-models: .. GENERATED FROM PYTHON SOURCE LINES 105-107 .. code-block:: Python rmse.compute_leave_one_out_measure(store_resampling_result=True) .. rst-class:: sphx-glr-script-out .. code-block:: none array([14.76062648]) .. GENERATED FROM PYTHON SOURCE LINES 108-109 and plotting their outputs: .. GENERATED FROM PYTHON SOURCE LINES 109-118 .. code-block:: Python plot = plt.plot(x_test, y_test, label="Reference") plt.plot(x_train, y_train, "o", color=plot[0].get_color(), label="Training dataset") plt.plot(x_test, polynomial.predict(x_test[:, newaxis]), label="Model") for i, algo in enumerate(polynomial.resampling_results["LeaveOneOut"][1], 1): plt.plot(x_test, algo.predict(x_test[:, newaxis]), label=f"Sub-model {i}") plt.legend() plt.grid() plt.show() .. image-sg:: /examples/mlearning/quality_measure/images/sphx_glr_plot_leave_one_out_001.png :alt: plot leave one out :srcset: /examples/mlearning/quality_measure/images/sphx_glr_plot_leave_one_out_001.png :class: sphx-glr-single-img .. GENERATED FROM PYTHON SOURCE LINES 119-123 We can see that this pessimistic error is mainly due to the second sub-model which did not learn the first training point and therefore has a very high extrapolation error. .. rst-class:: sphx-glr-timing **Total running time of the script:** (0 minutes 0.102 seconds) .. _sphx_glr_download_examples_mlearning_quality_measure_plot_leave_one_out.py: .. only:: html .. container:: sphx-glr-footer sphx-glr-footer-example .. container:: sphx-glr-download sphx-glr-download-jupyter :download:`Download Jupyter notebook: plot_leave_one_out.ipynb ` .. container:: sphx-glr-download sphx-glr-download-python :download:`Download Python source code: plot_leave_one_out.py ` .. container:: sphx-glr-download sphx-glr-download-zip :download:`Download zipped: plot_leave_one_out.zip ` .. only:: html .. rst-class:: sphx-glr-signature `Gallery generated by Sphinx-Gallery `_