{
  "cells": [
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {
        "collapsed": false
      },
      "outputs": [],
      "source": [
        "%matplotlib inline"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {},
      "source": [
        "\n# Plug a surrogate discipline in a Scenario\n\nIn this section we describe the usage of surrogate model in |g|,\nwhich is implemented in the :class:`.SurrogateDiscipline` class.\n\nA :class:`.SurrogateDiscipline` can be used to substitute a\n:class:`.MDODiscipline` within a :class:`.Scenario`. This\n:class:`.SurrogateDiscipline` is an evaluation of the :class:`.MDODiscipline`\nand is faster to compute than the original discipline. It relies on a\n:class:`.MLRegressionAlgo`. This comes at the price of computing a :term:`DOE`\non the original :class:`.MDODiscipline`, and validating the approximation. The\ncomputations from which the approximation is built can be available, or can be\nbuilt using |g|' :term:`DOE` capabilities. See `sobieski_doe` and\n`sellar_mdo`.\n\nIn |g|'s, the data used to build the surrogate model is taken from a\n:class:`.Dataset` containing both inputs and outputs of the :term:`DOE`. This\n:class:`.Dataset` may have been generated by |g| from a cache, using the\n:meth:`.AbstractFullCache.export_to_dataset` method, or, from a numpy array or\na text file using the :meth:`.Dataset.set_from_array` and\n:meth:`.Dataset.set_from_file`.\n\nThen, the surrogate discipline can be used as any other discipline in a\n:class:`.MDOScenario`, a :class:`.DOEScenario`, or a :class:`.MDA`.\n"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {
        "collapsed": false
      },
      "outputs": [],
      "source": [
        "from __future__ import division, unicode_literals\n\nfrom numpy import array, hstack, vstack\n\nfrom gemseo.api import (\n    configure_logger,\n    create_discipline,\n    create_scenario,\n    create_surrogate,\n)\nfrom gemseo.core.dataset import Dataset\nfrom gemseo.problems.sobieski.core import SobieskiProblem\n\nconfigure_logger()"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {},
      "source": [
        "## Create a surrogate discipline\n\n### Create the learning dataset\n\nIf you already have available data from a :term:`DOE` produced externally,\nit is possible to create a :class:`.Dataset` and Step 1 ends here.\nFor example, let us consider a synthetic dataset, with $x$\nas input and $y$ as output, described as a numpy\narray. Then, we store these data in a :class:`.Dataset`:\n\n\n"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {
        "collapsed": false
      },
      "outputs": [],
      "source": [
        "variables = [\"x\", \"y\"]\nsizes = {\"x\": 1, \"y\": 1}\ngroups = {\"x\": \"inputs\", \"y\": \"outputs\"}\ndata = vstack(\n    (\n        hstack((array([1.0]), array([1.0]))),\n        hstack((array([2.0]), array([2.0]))),\n    )\n)\nsynthetic_dataset = Dataset()\nsynthetic_dataset.set_from_array(data, variables, sizes, groups)"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {},
      "source": [
        "If you do not have available data,the following paragraphs of Step 1 concern you.\n\nHere, we illustrate the generation of the training data using a :class:`.DOEScenario`,\nsimilarly to `sobieski_doe`, where more details are given.\n\nIn this basic example, an :class:`.MDODiscipline` computing the mission\nperformance (range) in the `SSBJ test case <sobieski_problem>` is\nsampled with a :class:`.DOEScenario`. Then, the generated database is used to\nbuild a :class:`.SurrogateDiscipline`.\n\nBut more complex scenarios can be used in the same way: complete optimization\nprocesses or MDAs can be replaced by their surrogate counterparts. The right\n:term:`HDF` cache shall then be used to build the\n:class:`.SurrogateDiscipline`, but the main logic won't differ from this\nexample.\n\nFirstly, we create the :class:`.MDODiscipline` by means of the API function\n:meth:`~gemseo.api.create_discipline` and cache the evaluations in memory,\nusing the :meth:`.MDODiscipline.set_cache_policy` method:\n\n\n"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {
        "collapsed": false
      },
      "outputs": [],
      "source": [
        "discipline = create_discipline(\"SobieskiMission\")\ndiscipline.set_cache_policy(cache_type=discipline.MEMORY_FULL_CACHE)"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {},
      "source": [
        "\nThen, we read the :class:`.DesignSpace` of the `Sobieski problem\n<sobieski_problem>` and keep only the inputs of the Sobieski Mission\n\"x_shared\", \"y_24\", \"y_34\"\nas inputs of the DOE:\n\n\n"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {
        "collapsed": false
      },
      "outputs": [],
      "source": [
        "design_space = SobieskiProblem().read_design_space()\ndesign_space = design_space.filter([\"x_shared\", \"y_24\", \"y_34\"])"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {},
      "source": [
        "From this :class:`.MDODiscipline` and this :class:`.DesignSpace`,\nwe build a :class:`.DOEScenario`\nby means of the API function :meth:`~gemseo.api.create_scenario`:\n\n\n"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {
        "collapsed": false
      },
      "outputs": [],
      "source": [
        "scenario = create_scenario(\n    [discipline],\n    \"DisciplinaryOpt\",\n    objective_name=\"y_4\",\n    design_space=design_space,\n    scenario_type=\"DOE\",\n)"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {},
      "source": [
        "Lastly, we execute the process with the :term:`LHS` algorithm and 30 samples.\n\n"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {
        "collapsed": false
      },
      "outputs": [],
      "source": [
        "scenario.execute({\"n_samples\": 30, \"algo\": \"lhs\"})\nmission_dataset = discipline.cache.export_to_dataset(\n    inputs_names=[\"x_shared\", \"y_24\", \"y_34\"]\n)"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {},
      "source": [
        ".. seealso::\n\n   In this tutorial, the :term:`DOE` is based on `pyDOE\n   <https://pythonhosted.org/pyDOE/>`_, however, several other designs are\n   available, based on the package or `OpenTURNS\n   <http://www.openturns.org/>`_. Some examples of these designs are plotted\n   in `doe_algos`.  To list the available :term:`DOE` algorithms in the\n   current |g| configuration, use\n   :meth:`gemseo.api.get_available_doe_algorithms`.\n\n"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {},
      "source": [
        "### Create the :class:`.SurrogateDiscipline`\n\nFrom this :class:`.Dataset`, we can build a :class:`.SurrogateDiscipline`\nof the :class:`.MDODiscipline`.\n\nIndeed, by means of the API function :class:`~gemseo.api.create_surrogate`,\nwe create the :class:`.SurrogateDiscipline` from the cache file,\nwhich can be executed as any other :term:`discipline`.\n\nPrecisely,\nby means of the API function :meth:`~gemseo.api.create_surrogate`,\nwe create a :class:`.SurrogateDiscipline` relying on a :class:`.LinearRegression`\nand inheriting from :class:`.MDODiscipline`:\n\n"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {
        "collapsed": false
      },
      "outputs": [],
      "source": [
        "synthetic_surrogate = create_surrogate(\"LinearRegression\", synthetic_dataset)"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {},
      "source": [
        ".. seealso::\n\n   Note that a subset of the inputs and outputs to be used to build the\n   :class:`.SurrogateDiscipline` may be specified by the user if needed,\n   mainly to avoid unnecessary computations.\n\nThen, we execute it as any :class:`.MDODiscipline`:\n\n"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {
        "collapsed": false
      },
      "outputs": [],
      "source": [
        "input_data = {\"x\": array([2.0])}\nout = synthetic_surrogate.execute(input_data)\nprint(out[\"y\"])"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {},
      "source": [
        "In our study case, from the :term:`DOE` built at Step 1,\nwe build a :class:`.RBFRegression`  of $y_4$\nrepresenting the range in function of L/D:\n\n"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {
        "collapsed": false
      },
      "outputs": [],
      "source": [
        "range_surrogate = create_surrogate(\"RBFRegression\", mission_dataset)"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {},
      "source": [
        "## Use the :class:`.SurrogateDiscipline` in MDO\n\nThe obtained :class:`.SurrogateDiscipline` can be used in any\n:class:`.Scenario`, such as a :class:`.DOEScenario` or :class:`.MDOScenario`.\nWe see here that the :meth:`.MDODiscipline.execute` method can be used as in\nany other discipline to compute the outputs for given inputs:\n\n"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {
        "collapsed": false
      },
      "outputs": [],
      "source": [
        "for i in range(5):\n    lod = i * 2.0\n    y_4_pred = range_surrogate.execute({\"y_24\": array([lod])})[\"y_4\"]\n    print(\"Surrogate range (L/D = {}) = {}\".format(lod, y_4_pred))"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {},
      "source": [
        "And we can build and execute an optimization scenario from it.\nThe design variables are \"y_24\". The Jacobian matrix is computed by finite\ndifferences by default for surrogates, except for the\n:class:`.SurrogateDiscipline` relying on :class:`.LinearRegression` which has\nan analytical (and constant) Jacobian.\n\n"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {
        "collapsed": false
      },
      "outputs": [],
      "source": [
        "design_space = design_space.filter([\"y_24\"])\nscenario = create_scenario(\n    range_surrogate,\n    formulation=\"DisciplinaryOpt\",\n    objective_name=\"y_4\",\n    design_space=design_space,\n    scenario_type=\"MDO\",\n    maximize_objective=True,\n)\nscenario.execute({\"max_iter\": 30, \"algo\": \"L-BFGS-B\"})"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {},
      "source": [
        "## Available surrogate models\n\nCurrently, the following surrogate models are available:\n\n- Linear regression,\n  based on the `Scikit-learn <http://scikit-learn.org/stable/>`_ library,\n  for that use the :class:`.LinearRegression` class.\n- Polynomial regression,\n  based on the `Scikit-learn  <http://scikit-learn.org/stable/>`_ library,\n  for that use the :class:`.PolynomialRegression` class,\n- Gaussian processes (also known as Kriging),\n  based on the `Scikit-learn  <http://scikit-learn.org/stable/>`_ library,\n  for that use the :class:`.GaussianProcessRegression` class,\n- Mixture of experts, for that use the :class:`.MixtureOfExperts` class,\n- Random forest models,\n  based on the `Scikit-learn # <http://scikit-learn.org/stable/>`_ library,\n  for that use the :class:`.RandomForestRegressor` class.\n- RBF models (Radial Basis Functions),\n  using the `SciPy  <http://www.scipy.org/>`_ library,\n  for that use the :class:`.RBFRegression` class.\n- PCE models (Polynomial Chaos Expansion),\n  based on the `OpenTURNS  <http://www.openturns.org/>`_ library,\n  for that use the :class:`.PCERegression` class.\n\nTo understand the detailed behavior of the models, please go to the\ndocumentation of the used packages.\n\nExtending surrogate models --------------------------\n\nAll surrogate models work the same way: the :class:`.MLRegressionAlgo` base\nclass shall be extended. See `extending-gemseo` to learn how to run\n|g|\nwith external Python modules. Then, the :class:`.RegressionModelFactory` can\nbuild the new :class:`.MLRegressionAlgo` automatically from its regression\nalgorithm name and options. This factory is called by the constructor of\n:class:`.SurrogateDiscipline`.\n\n.. seealso::\n\n   More generally, |g| provides extension mechanisms to integrate external :DOE\n   and optimization algorithms, disciplines, MDAs and surrogate models.\n\n"
      ]
    }
  ],
  "metadata": {
    "kernelspec": {
      "display_name": "Python 3",
      "language": "python",
      "name": "python3"
    },
    "language_info": {
      "codemirror_mode": {
        "name": "ipython",
        "version": 3
      },
      "file_extension": ".py",
      "mimetype": "text/x-python",
      "name": "python",
      "nbconvert_exporter": "python",
      "pygments_lexer": "ipython3",
      "version": "3.8.12"
    }
  },
  "nbformat": 4,
  "nbformat_minor": 0
}