.. DO NOT EDIT.
.. THIS FILE WAS AUTOMATICALLY GENERATED BY SPHINX-GALLERY.
.. TO MAKE CHANGES, EDIT THE SOURCE PYTHON FILE:
.. "examples/mlearning/quality_measure/plot_mse.py"
.. LINE NUMBERS ARE GIVEN BELOW.

.. only:: html

    .. note::
        :class: sphx-glr-download-link-note

        Click :ref:`here <sphx_glr_download_examples_mlearning_quality_measure_plot_mse.py>`
        to download the full example code

.. rst-class:: sphx-glr-example-title

.. _sphx_glr_examples_mlearning_quality_measure_plot_mse.py:


MSE example - test-train split
==============================

In this example we consider a polynomial linear regression, splitting the data
into two sets. We measure the quality of the regression by comparing the
predictions with the output on the test set.

.. GENERATED FROM PYTHON SOURCE LINES 32-45

.. code-block:: default

    from __future__ import division, unicode_literals

    import matplotlib.pyplot as plt
    from numpy import arange, argmin, hstack, linspace, sort
    from numpy.random import choice, normal, seed

    from gemseo.api import configure_logger, create_dataset
    from gemseo.core.dataset import Dataset
    from gemseo.mlearning.api import create_regression_model
    from gemseo.mlearning.qual_measure.mse_measure import MSEMeasure

    configure_logger()


.. rst-class:: sphx-glr-script-out

 Out:

 .. code-block:: none


    <RootLogger root (INFO)>


.. GENERATED FROM PYTHON SOURCE LINES 46-48

Define parameters
-----------------

.. GENERATED FROM PYTHON SOURCE LINES 48-54

.. code-block:: default

    seed(12345)
    n_samples = 10
    noise = 0.3 ** 2
    max_pow = 5
    amount_train = 0.8


.. GENERATED FROM PYTHON SOURCE LINES 55-58

Construct data
--------------
We construct a parabola with added noise, on the interval [0, 1].

.. GENERATED FROM PYTHON SOURCE LINES 58-67

.. code-block:: default


    def f(x):
        return -4 * (x - 0.5) ** 2 + 3


    x = linspace(0, 1, n_samples)
    y = f(x) + normal(0, noise, n_samples)


.. GENERATED FROM PYTHON SOURCE LINES 68-70

Indices for test-train split
----------------------------

.. GENERATED FROM PYTHON SOURCE LINES 70-78

.. code-block:: default

    samples = arange(n_samples)
    n_train = int(amount_train * n_samples)
    n_test = n_samples - n_train
    train = sort(choice(samples, n_train, False))
    test = sort([sample for sample in samples if sample not in train])
    print("Train:", train)
    print("Test:", test)


.. rst-class:: sphx-glr-script-out

 Out:

 .. code-block:: none

    Train: [1 3 4 5 6 7 8 9]
    Test: [0 2]


.. GENERATED FROM PYTHON SOURCE LINES 79-81

Build datasets
--------------

.. GENERATED FROM PYTHON SOURCE LINES 81-87

.. code-block:: default

    data = hstack([x[:, None], y[:, None]])
    variables = ["x", "y"]
    groups = {"x": Dataset.INPUT_GROUP, "y": Dataset.OUTPUT_GROUP}
    dataset = create_dataset("synthetic_data", data[train], variables, groups=groups)
    dataset_test = create_dataset("synthetic_data", data[test], variables, groups=groups)


.. GENERATED FROM PYTHON SOURCE LINES 88-90

Build regression model
----------------------

.. GENERATED FROM PYTHON SOURCE LINES 90-93

.. code-block:: default

    model = create_regression_model("PolynomialRegression", dataset, degree=max_pow)
    print(model)


.. rst-class:: sphx-glr-script-out

 Out:

 .. code-block:: none

    PolynomialRegression(degree=5, fit_intercept=True, l2_penalty_ratio=1.0, penalty_level=0.0)
       based on the scikit-learn library


.. GENERATED FROM PYTHON SOURCE LINES 94-96

Predictions errors
------------------

.. GENERATED FROM PYTHON SOURCE LINES 96-104

.. code-block:: default

    measure = MSEMeasure(model)

    mse_train = measure.evaluate("learn")
    mse_test = measure.evaluate("test", test_data=dataset_test)

    print("Training error:", mse_train)
    print("Test error:", mse_test)


.. rst-class:: sphx-glr-script-out

 Out:

 .. code-block:: none

    /home/docs/checkouts/readthedocs.org/user_builds/gemseo/conda/3.2.0/lib/python3.8/site-packages/sklearn/linear_model/_base.py:148: FutureWarning: 'normalize' was deprecated in version 1.0 and will be removed in 1.2. Please leave the normalize parameter to its default value to silence this warning. The default behavior of this estimator is to not do any normalization. If normalization is needed please use sklearn.preprocessing.StandardScaler instead.
      warnings.warn(
    /home/docs/checkouts/readthedocs.org/user_builds/gemseo/conda/3.2.0/lib/python3.8/site-packages/sklearn/linear_model/_base.py:148: FutureWarning: 'normalize' was deprecated in version 1.0 and will be removed in 1.2. Please leave the normalize parameter to its default value to silence this warning. The default behavior of this estimator is to not do any normalization. If normalization is needed please use sklearn.preprocessing.StandardScaler instead.
      warnings.warn(
    Training error: [0.0003947]
    Test error: [2.29565983]


.. GENERATED FROM PYTHON SOURCE LINES 105-107

Compute predictions
-------------------

.. GENERATED FROM PYTHON SOURCE LINES 107-114

.. code-block:: default

    measure = MSEMeasure(model)
    model.learn()

    n_refined = 1000
    x_refined = linspace(0, 1, n_refined)
    y_refined = model.predict({"x": x_refined[:, None]})["y"].flatten()


.. rst-class:: sphx-glr-script-out

 Out:

 .. code-block:: none

    /home/docs/checkouts/readthedocs.org/user_builds/gemseo/conda/3.2.0/lib/python3.8/site-packages/sklearn/linear_model/_base.py:148: FutureWarning: 'normalize' was deprecated in version 1.0 and will be removed in 1.2. Please leave the normalize parameter to its default value to silence this warning. The default behavior of this estimator is to not do any normalization. If normalization is needed please use sklearn.preprocessing.StandardScaler instead.
      warnings.warn(


.. GENERATED FROM PYTHON SOURCE LINES 115-117

Plot data points
----------------

.. GENERATED FROM PYTHON SOURCE LINES 117-122

.. code-block:: default

    plt.plot(x_refined, f(x_refined), label="Exact function")
    plt.scatter(x, y, label="Data points")
    plt.legend()
    plt.show()


.. image:: /examples/mlearning/quality_measure/images/sphx_glr_plot_mse_001.png
    :alt: plot mse
    :class: sphx-glr-single-img


.. GENERATED FROM PYTHON SOURCE LINES 123-125

Plot predictions
----------------

.. GENERATED FROM PYTHON SOURCE LINES 125-131

.. code-block:: default

    plt.plot(x_refined, y_refined, label="Prediction (x^{})".format(max_pow))
    plt.scatter(x[train], y[train], label="Train")
    plt.scatter(x[test], y[test], color="r", label="Test")
    plt.legend()
    plt.show()


.. image:: /examples/mlearning/quality_measure/images/sphx_glr_plot_mse_002.png
    :alt: plot mse
    :class: sphx-glr-single-img


.. GENERATED FROM PYTHON SOURCE LINES 132-134

Compare different parameters
----------------------------

.. GENERATED FROM PYTHON SOURCE LINES 134-152

.. code-block:: default

    powers = [1, 2, 3, 4, 5, 7]
    test_errors = []
    for power in powers:
        model = create_regression_model("PolynomialRegression", dataset, degree=power)
        measure = MSEMeasure(model)

        test_mse = measure.evaluate("test", test_data=dataset_test)
        test_errors += [test_mse]

        y_refined = model.predict({"x": x_refined[:, None]})["y"].flatten()

        plt.plot(x_refined, y_refined, label="x^{}".format(power))

    plt.scatter(x[train], y[train], label="Train")
    plt.scatter(x[test], y[test], color="r", label="Test")
    plt.legend()
    plt.show()


.. image:: /examples/mlearning/quality_measure/images/sphx_glr_plot_mse_003.png
    :alt: plot mse
    :class: sphx-glr-single-img


.. rst-class:: sphx-glr-script-out

 Out:

 .. code-block:: none

    /home/docs/checkouts/readthedocs.org/user_builds/gemseo/conda/3.2.0/lib/python3.8/site-packages/sklearn/linear_model/_base.py:148: FutureWarning: 'normalize' was deprecated in version 1.0 and will be removed in 1.2. Please leave the normalize parameter to its default value to silence this warning. The default behavior of this estimator is to not do any normalization. If normalization is needed please use sklearn.preprocessing.StandardScaler instead.
      warnings.warn(
    /home/docs/checkouts/readthedocs.org/user_builds/gemseo/conda/3.2.0/lib/python3.8/site-packages/sklearn/linear_model/_base.py:148: FutureWarning: 'normalize' was deprecated in version 1.0 and will be removed in 1.2. Please leave the normalize parameter to its default value to silence this warning. The default behavior of this estimator is to not do any normalization. If normalization is needed please use sklearn.preprocessing.StandardScaler instead.
      warnings.warn(
    /home/docs/checkouts/readthedocs.org/user_builds/gemseo/conda/3.2.0/lib/python3.8/site-packages/sklearn/linear_model/_base.py:148: FutureWarning: 'normalize' was deprecated in version 1.0 and will be removed in 1.2. Please leave the normalize parameter to its default value to silence this warning. The default behavior of this estimator is to not do any normalization. If normalization is needed please use sklearn.preprocessing.StandardScaler instead.
      warnings.warn(
    /home/docs/checkouts/readthedocs.org/user_builds/gemseo/conda/3.2.0/lib/python3.8/site-packages/sklearn/linear_model/_base.py:148: FutureWarning: 'normalize' was deprecated in version 1.0 and will be removed in 1.2. Please leave the normalize parameter to its default value to silence this warning. The default behavior of this estimator is to not do any normalization. If normalization is needed please use sklearn.preprocessing.StandardScaler instead.
      warnings.warn(
    /home/docs/checkouts/readthedocs.org/user_builds/gemseo/conda/3.2.0/lib/python3.8/site-packages/sklearn/linear_model/_base.py:148: FutureWarning: 'normalize' was deprecated in version 1.0 and will be removed in 1.2. Please leave the normalize parameter to its default value to silence this warning. The default behavior of this estimator is to not do any normalization. If normalization is needed please use sklearn.preprocessing.StandardScaler instead.
      warnings.warn(
    /home/docs/checkouts/readthedocs.org/user_builds/gemseo/conda/3.2.0/lib/python3.8/site-packages/sklearn/linear_model/_base.py:148: FutureWarning: 'normalize' was deprecated in version 1.0 and will be removed in 1.2. Please leave the normalize parameter to its default value to silence this warning. The default behavior of this estimator is to not do any normalization. If normalization is needed please use sklearn.preprocessing.StandardScaler instead.
      warnings.warn(


.. GENERATED FROM PYTHON SOURCE LINES 153-154

Grid search

.. GENERATED FROM PYTHON SOURCE LINES 154-156

.. code-block:: default

    print(test_errors)
    print("Power for minimal test error:", argmin(test_errors))


.. rst-class:: sphx-glr-script-out

 Out:

 .. code-block:: none

    [array([0.54513687]), array([0.00518409]), array([0.00584647]), array([0.06387849]), array([2.29565983]), array([1.16961302])]
    Power for minimal test error: 1


.. rst-class:: sphx-glr-timing

   **Total running time of the script:** ( 0 minutes  0.493 seconds)


.. _sphx_glr_download_examples_mlearning_quality_measure_plot_mse.py:


.. only :: html

 .. container:: sphx-glr-footer
    :class: sphx-glr-footer-example


  .. container:: sphx-glr-download sphx-glr-download-python

     :download:`Download Python source code: plot_mse.py <plot_mse.py>`


  .. container:: sphx-glr-download sphx-glr-download-jupyter

     :download:`Download Jupyter notebook: plot_mse.ipynb <plot_mse.ipynb>`


.. only:: html

 .. rst-class:: sphx-glr-signature

    `Gallery generated by Sphinx-Gallery <https://sphinx-gallery.github.io>`_