{
  "cells": [
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {
        "collapsed": false
      },
      "outputs": [],
      "source": [
        "%matplotlib inline"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {},
      "source": [
        "\n# Machine learning algorithm selection example\n\nIn this example we use the :class:`.MLAlgoSelection` class to perform a grid\nsearch over different algorithms and hyperparameter values.\n"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {
        "collapsed": false
      },
      "outputs": [],
      "source": [
        "from __future__ import division, unicode_literals\n\nimport matplotlib.pyplot as plt\nimport numpy as np\n\nfrom gemseo.algos.design_space import DesignSpace\nfrom gemseo.core.dataset import Dataset\nfrom gemseo.mlearning.core.selection import MLAlgoSelection\nfrom gemseo.mlearning.qual_measure.mse_measure import MSEMeasure\n\nnp.random.seed(54321)"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {},
      "source": [
        "## Build dataset\nThe data consists of a 1D-function $f:[0,1]\\to[0,1]$, where\n$f(x)=x^2$. The inputs $(x_i)_{i=1,\\cdots,n}$ are chosen randomly\nfrom the interval $[0,1]$. The outputs\n$y_i = f(x_i) + \\epsilon_i$contain added noise, where\n$\\epsilon_i\\tilde \\mathcal{N}(0,\\sigma^2)$.\nWe choose $n=20$ and $\\sigma=0.05$.\n\n"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {
        "collapsed": false
      },
      "outputs": [],
      "source": [
        "n = 20\nx = np.sort(np.random.random(n))\ny = x ** 2 + np.random.normal(0, 0.05, n)\n\ndataset = Dataset()\ndataset.add_variable(\"x\", x[:, None], Dataset.INPUT_GROUP)\ndataset.add_variable(\"y\", y[:, None], Dataset.OUTPUT_GROUP, cache_as_input=False)"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {},
      "source": [
        "## Build selector\nWe consider three regression models, with different possible hyperparameters.\nA mean squared error quality measure is used with a k-folds cross validation\nscheme (5 folds).\n\n"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {
        "collapsed": false
      },
      "outputs": [],
      "source": [
        "selector = MLAlgoSelection(dataset, MSEMeasure, eval_method=\"kfolds\", n_folds=5)\nselector.add_candidate(\n    \"LinearRegression\",\n    penalty_level=[0, 0.1, 1, 10, 20],\n    l2_penalty_ratio=[0, 0.5, 1],\n    fit_intercept=[True],\n)\nselector.add_candidate(\n    \"PolynomialRegression\",\n    degree=[2, 3, 4, 10],\n    penalty_level=[0, 0.1, 1, 10],\n    l2_penalty_ratio=[1],\n    fit_intercept=[True, False],\n)\nrbf_space = DesignSpace()\nrbf_space.add_variable(\"epsilon\", 1, \"float\", 0.01, 0.1, 0.05)\nselector.add_candidate(\n    \"RBFRegression\",\n    calib_space=rbf_space,\n    calib_algo={\"algo\": \"fullfact\", \"n_samples\": 16},\n    smooth=[0, 0.01, 0.1, 1, 10, 100],\n)"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {},
      "source": [
        "## Select best candidate\n\n"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {
        "collapsed": false
      },
      "outputs": [],
      "source": [
        "best_algo = selector.select()\nprint(best_algo)"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {},
      "source": [
        "## Plot results\nPlot the best models from each candidate algorithm\n\n"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {
        "collapsed": false
      },
      "outputs": [],
      "source": [
        "finex = np.linspace(0, 1, 1000)\nfor candidate in selector.candidates:\n    algo = candidate[0]\n    print(algo)\n    predy = algo.predict(finex[:, None])[:, 0]\n    plt.plot(finex, predy, label=algo.ABBR)\nplt.scatter(x, y, label=\"Training points\")\nplt.legend()\nplt.show()"
      ]
    }
  ],
  "metadata": {
    "kernelspec": {
      "display_name": "Python 3",
      "language": "python",
      "name": "python3"
    },
    "language_info": {
      "codemirror_mode": {
        "name": "ipython",
        "version": 3
      },
      "file_extension": ".py",
      "mimetype": "text/x-python",
      "name": "python",
      "nbconvert_exporter": "python",
      "pygments_lexer": "ipython3",
      "version": "3.8.12"
    }
  },
  "nbformat": 4,
  "nbformat_minor": 0
}