.. DO NOT EDIT. .. THIS FILE WAS AUTOMATICALLY GENERATED BY SPHINX-GALLERY. .. TO MAKE CHANGES, EDIT THE SOURCE PYTHON FILE: .. "examples/mlearning/quality_measure/plot_mse.py" .. LINE NUMBERS ARE GIVEN BELOW. .. only:: html .. note:: :class: sphx-glr-download-link-note Click :ref:`here ` to download the full example code .. rst-class:: sphx-glr-example-title .. _sphx_glr_examples_mlearning_quality_measure_plot_mse.py: MSE example - test-train split ============================== In this example we consider a polynomial linear regression, splitting the data into two sets. We measure the quality of the regression by comparing the predictions with the output on the test set. .. GENERATED FROM PYTHON SOURCE LINES 30-47 .. code-block:: default import matplotlib.pyplot as plt from gemseo.api import configure_logger from gemseo.api import create_dataset from gemseo.core.dataset import Dataset from gemseo.mlearning.api import create_regression_model from gemseo.mlearning.qual_measure.mse_measure import MSEMeasure from numpy import arange from numpy import argmin from numpy import hstack from numpy import linspace from numpy import sort from numpy.random import choice from numpy.random import normal from numpy.random import seed configure_logger() .. rst-class:: sphx-glr-script-out Out: .. code-block:: none .. GENERATED FROM PYTHON SOURCE LINES 48-50 Define parameters ----------------- .. GENERATED FROM PYTHON SOURCE LINES 50-56 .. code-block:: default seed(12345) n_samples = 10 noise = 0.3**2 max_pow = 5 amount_train = 0.8 .. GENERATED FROM PYTHON SOURCE LINES 57-60 Construct data -------------- We construct a parabola with added noise, on the interval [0, 1]. .. GENERATED FROM PYTHON SOURCE LINES 60-69 .. code-block:: default def f(x): return -4 * (x - 0.5) ** 2 + 3 x = linspace(0, 1, n_samples) y = f(x) + normal(0, noise, n_samples) .. GENERATED FROM PYTHON SOURCE LINES 70-72 Indices for test-train split ---------------------------- .. GENERATED FROM PYTHON SOURCE LINES 72-80 .. code-block:: default samples = arange(n_samples) n_train = int(amount_train * n_samples) n_test = n_samples - n_train train = sort(choice(samples, n_train, False)) test = sort([sample for sample in samples if sample not in train]) print("Train:", train) print("Test:", test) .. rst-class:: sphx-glr-script-out Out: .. code-block:: none Train: [1 3 4 5 6 7 8 9] Test: [0 2] .. GENERATED FROM PYTHON SOURCE LINES 81-83 Build datasets -------------- .. GENERATED FROM PYTHON SOURCE LINES 83-89 .. code-block:: default data = hstack([x[:, None], y[:, None]]) variables = ["x", "y"] groups = {"x": Dataset.INPUT_GROUP, "y": Dataset.OUTPUT_GROUP} dataset = create_dataset("synthetic_data", data[train], variables, groups=groups) dataset_test = create_dataset("synthetic_data", data[test], variables, groups=groups) .. GENERATED FROM PYTHON SOURCE LINES 90-92 Build regression model ---------------------- .. GENERATED FROM PYTHON SOURCE LINES 92-95 .. code-block:: default model = create_regression_model("PolynomialRegressor", dataset, degree=max_pow) print(model) .. rst-class:: sphx-glr-script-out Out: .. code-block:: none PolynomialRegressor(degree=5, fit_intercept=True, l2_penalty_ratio=1.0, penalty_level=0.0) based on the scikit-learn library .. GENERATED FROM PYTHON SOURCE LINES 96-98 Predictions errors ------------------ .. GENERATED FROM PYTHON SOURCE LINES 98-106 .. code-block:: default measure = MSEMeasure(model) mse_train = measure.evaluate("learn") mse_test = measure.evaluate("test", test_data=dataset_test) print("Training error:", mse_train) print("Test error:", mse_test) .. rst-class:: sphx-glr-script-out Out: .. code-block:: none Training error: [0.0003947] Test error: [2.29565983] .. GENERATED FROM PYTHON SOURCE LINES 107-109 Compute predictions ------------------- .. GENERATED FROM PYTHON SOURCE LINES 109-116 .. code-block:: default measure = MSEMeasure(model) model.learn() n_refined = 1000 x_refined = linspace(0, 1, n_refined) y_refined = model.predict({"x": x_refined[:, None]})["y"].flatten() .. GENERATED FROM PYTHON SOURCE LINES 117-119 Plot data points ---------------- .. GENERATED FROM PYTHON SOURCE LINES 119-124 .. code-block:: default plt.plot(x_refined, f(x_refined), label="Exact function") plt.scatter(x, y, label="Data points") plt.legend() plt.show() .. image-sg:: /examples/mlearning/quality_measure/images/sphx_glr_plot_mse_001.png :alt: plot mse :srcset: /examples/mlearning/quality_measure/images/sphx_glr_plot_mse_001.png :class: sphx-glr-single-img .. GENERATED FROM PYTHON SOURCE LINES 125-127 Plot predictions ---------------- .. GENERATED FROM PYTHON SOURCE LINES 127-133 .. code-block:: default plt.plot(x_refined, y_refined, label=f"Prediction (x^{max_pow})") plt.scatter(x[train], y[train], label="Train") plt.scatter(x[test], y[test], color="r", label="Test") plt.legend() plt.show() .. image-sg:: /examples/mlearning/quality_measure/images/sphx_glr_plot_mse_002.png :alt: plot mse :srcset: /examples/mlearning/quality_measure/images/sphx_glr_plot_mse_002.png :class: sphx-glr-single-img .. GENERATED FROM PYTHON SOURCE LINES 134-136 Compare different parameters ---------------------------- .. GENERATED FROM PYTHON SOURCE LINES 136-154 .. code-block:: default powers = [1, 2, 3, 4, 5, 7] test_errors = [] for power in powers: model = create_regression_model("PolynomialRegressor", dataset, degree=power) measure = MSEMeasure(model) test_mse = measure.evaluate("test", test_data=dataset_test) test_errors += [test_mse] y_refined = model.predict({"x": x_refined[:, None]})["y"].flatten() plt.plot(x_refined, y_refined, label=f"x^{power}") plt.scatter(x[train], y[train], label="Train") plt.scatter(x[test], y[test], color="r", label="Test") plt.legend() plt.show() .. image-sg:: /examples/mlearning/quality_measure/images/sphx_glr_plot_mse_003.png :alt: plot mse :srcset: /examples/mlearning/quality_measure/images/sphx_glr_plot_mse_003.png :class: sphx-glr-single-img .. GENERATED FROM PYTHON SOURCE LINES 155-156 Grid search .. GENERATED FROM PYTHON SOURCE LINES 156-158 .. code-block:: default print(test_errors) print("Power for minimal test error:", argmin(test_errors)) .. rst-class:: sphx-glr-script-out Out: .. code-block:: none [array([0.54513687]), array([0.00518409]), array([0.00584647]), array([0.06387849]), array([2.29565983]), array([1.16961302])] Power for minimal test error: 1 .. rst-class:: sphx-glr-timing **Total running time of the script:** ( 0 minutes 0.459 seconds) .. _sphx_glr_download_examples_mlearning_quality_measure_plot_mse.py: .. only :: html .. container:: sphx-glr-footer :class: sphx-glr-footer-example .. container:: sphx-glr-download sphx-glr-download-python :download:`Download Python source code: plot_mse.py ` .. container:: sphx-glr-download sphx-glr-download-jupyter :download:`Download Jupyter notebook: plot_mse.ipynb ` .. only:: html .. rst-class:: sphx-glr-signature `Gallery generated by Sphinx-Gallery `_