.. DO NOT EDIT.
.. THIS FILE WAS AUTOMATICALLY GENERATED BY SPHINX-GALLERY.
.. TO MAKE CHANGES, EDIT THE SOURCE PYTHON FILE:
.. "examples/mlearning/calibration/plot_selection.py"
.. LINE NUMBERS ARE GIVEN BELOW.

.. only:: html

    .. note::
        :class: sphx-glr-download-link-note

        :ref:`Go to the end <sphx_glr_download_examples_mlearning_calibration_plot_selection.py>`
        to download the full example code

.. rst-class:: sphx-glr-example-title

.. _sphx_glr_examples_mlearning_calibration_plot_selection.py:


Machine learning algorithm selection example
============================================

In this example we use the :class:`.MLAlgoSelection` class to perform a grid
search over different algorithms and hyperparameter values.

.. GENERATED FROM PYTHON SOURCE LINES 23-38

.. code-block:: Python


    from __future__ import annotations

    import matplotlib.pyplot as plt
    from numpy import linspace
    from numpy import sort
    from numpy.random import default_rng

    from gemseo.algos.design_space import DesignSpace
    from gemseo.datasets.io_dataset import IODataset
    from gemseo.mlearning.core.selection import MLAlgoSelection
    from gemseo.mlearning.quality_measures.mse_measure import MSEMeasure

    rng = default_rng(54321)


.. GENERATED FROM PYTHON SOURCE LINES 39-48

Build dataset
-------------
The data are generated from the function :math:`f(x)=x^2`.
The input data :math:`\{x_i\}_{i=1,\cdots,20}` are chosen at random
over the interval :math:`[0,1]`.
The output value :math:`y_i = f(x_i) + \varepsilon_i` corresponds to
the evaluation of :math:`f` at :math:`x_i`
corrupted by a Gaussian noise :math:`\varepsilon_i`
with zero mean and standard deviation :math:`\sigma=0.05`.

.. GENERATED FROM PYTHON SOURCE LINES 48-56

.. code-block:: Python

    n = 20
    x = sort(rng.random(n))
    y = x**2 + rng.normal(0, 0.05, n)

    dataset = IODataset()
    dataset.add_variable("x", x[:, None], dataset.INPUT_GROUP)
    dataset.add_variable("y", y[:, None], dataset.OUTPUT_GROUP)


.. GENERATED FROM PYTHON SOURCE LINES 57-62

Build selector
--------------
We consider three regression models, with different possible hyperparameters.
A mean squared error quality measure is used with a k-folds cross validation
scheme (5 folds).

.. GENERATED FROM PYTHON SOURCE LINES 62-87

.. code-block:: Python

    selector = MLAlgoSelection(
        dataset, MSEMeasure, measure_evaluation_method_name="KFOLDS", n_folds=5
    )
    selector.add_candidate(
        "LinearRegressor",
        penalty_level=[0, 0.1, 1, 10, 20],
        l2_penalty_ratio=[0, 0.5, 1],
        fit_intercept=[True],
    )
    selector.add_candidate(
        "PolynomialRegressor",
        degree=[2, 3, 4, 10],
        penalty_level=[0, 0.1, 1, 10],
        l2_penalty_ratio=[1],
        fit_intercept=[True, False],
    )
    rbf_space = DesignSpace()
    rbf_space.add_variable("epsilon", 1, "float", 0.01, 0.1, 0.05)
    selector.add_candidate(
        "RBFRegressor",
        calib_space=rbf_space,
        calib_algo={"algo": "fullfact", "n_samples": 16},
        smooth=[0, 0.01, 0.1, 1, 10, 100],
    )


.. GENERATED FROM PYTHON SOURCE LINES 88-90

Select best candidate
---------------------

.. GENERATED FROM PYTHON SOURCE LINES 90-93

.. code-block:: Python

    best_algo = selector.select()
    best_algo


.. raw:: html

    <div class="output_subarea output_html rendered_html output_result">
    <div style='background-color: #fafae2; padding: 10px;'>PolynomialRegressor(degree=4, fit_intercept=True, l2_penalty_ratio=1, penalty_level=0, random_state=0)<br/><ul><li>based on the scikit-learn library</li><li>built from 20 learning samples</li></ul></div>
    </div>
    <br />
    <br />

.. GENERATED FROM PYTHON SOURCE LINES 94-97

Plot results
------------
Plot the best models from each candidate algorithm

.. GENERATED FROM PYTHON SOURCE LINES 97-106

.. code-block:: Python

    finex = linspace(0, 1, 1000)
    for candidate in selector.candidates:
        algo = candidate[0]
        print(algo)
        predy = algo.predict(finex[:, None])[:, 0]
        plt.plot(finex, predy, label=algo.SHORT_ALGO_NAME)
    plt.scatter(x, y, label="Training points")
    plt.legend()
    plt.show()


.. image-sg:: /examples/mlearning/calibration/images/sphx_glr_plot_selection_001.png
   :alt: plot selection
   :srcset: /examples/mlearning/calibration/images/sphx_glr_plot_selection_001.png
   :class: sphx-glr-single-img


.. rst-class:: sphx-glr-script-out

 .. code-block:: none

    LinearRegressor(fit_intercept=True, l2_penalty_ratio=0, penalty_level=0, random_state=0)
       based on the scikit-learn library
       built from 20 learning samples
    PolynomialRegressor(degree=4, fit_intercept=True, l2_penalty_ratio=1, penalty_level=0, random_state=0)
       based on the scikit-learn library
       built from 20 learning samples
    RBFRegressor(epsilon=0.1, function=multiquadric, norm=euclidean, smooth=0.1)
       based on the SciPy library
       built from 20 learning samples


.. rst-class:: sphx-glr-timing

   **Total running time of the script:** (0 minutes 3.821 seconds)


.. _sphx_glr_download_examples_mlearning_calibration_plot_selection.py:

.. only:: html

  .. container:: sphx-glr-footer sphx-glr-footer-example

    .. container:: sphx-glr-download sphx-glr-download-jupyter

      :download:`Download Jupyter notebook: plot_selection.ipynb <plot_selection.ipynb>`

    .. container:: sphx-glr-download sphx-glr-download-python

      :download:`Download Python source code: plot_selection.py <plot_selection.py>`


.. only:: html

 .. rst-class:: sphx-glr-signature

    `Gallery generated by Sphinx-Gallery <https://sphinx-gallery.github.io>`_