.. DO NOT EDIT. .. THIS FILE WAS AUTOMATICALLY GENERATED BY SPHINX-GALLERY. .. TO MAKE CHANGES, EDIT THE SOURCE PYTHON FILE: .. "examples/dataset/plot_iris.py" .. LINE NUMBERS ARE GIVEN BELOW. .. only:: html .. note:: :class: sphx-glr-download-link-note Click :ref:`here ` to download the full example code .. rst-class:: sphx-glr-example-title .. _sphx_glr_examples_dataset_plot_iris.py: Iris dataset ============ Presentation ------------ This is one of the best known dataset to be found in the machine learning literature. It was introduced by the statistician Ronald Fisher in his 1936 paper "The use of multiple measurements in taxonomic problems", Annals of Eugenics. 7 (2): 179–188. It contains 150 instances of iris plants: - 50 Iris Setosa, - 50 Iris Versicolour, - 50 Iris Virginica. Each instance is characterized by: - its sepal length in cm, - its sepal width in cm, - its petal length in cm, - its petal width in cm. This dataset can be used for either clustering purposes or classification ones. .. GENERATED FROM PYTHON SOURCE LINES 54-65 .. code-block:: default from __future__ import absolute_import, division, print_function, unicode_literals from future import standard_library from numpy.random import choice from gemseo.api import configure_logger, load_dataset configure_logger() standard_library.install_aliases() .. GENERATED FROM PYTHON SOURCE LINES 66-70 Load Iris dataset ----------------- We can easily load this dataset by means of the :meth:`~gemseo.api.load_dataset` function of the API: .. GENERATED FROM PYTHON SOURCE LINES 70-73 .. code-block:: default iris = load_dataset("IrisDataset") .. GENERATED FROM PYTHON SOURCE LINES 74-75 and get some information about it .. GENERATED FROM PYTHON SOURCE LINES 75-77 .. code-block:: default print(iris) .. rst-class:: sphx-glr-script-out Out: .. code-block:: none Iris | Number of samples: 150 | Number of variables: 5 | Variables names and sizes by group: | - parameters: sepal_length (1), sepal_width (1), petal_length (1), petal_width (1) | - labels: specy (1) | Number of dimensions (total = 5) by group: | - parameters: 4 | - labels: 1 .. GENERATED FROM PYTHON SOURCE LINES 78-81 Manipulate the dataset ---------------------- We randomly select 10 samples to display. .. GENERATED FROM PYTHON SOURCE LINES 81-84 .. code-block:: default shown_samples = choice(iris.length, size=10, replace=False) .. GENERATED FROM PYTHON SOURCE LINES 85-87 If the pandas library is installed, we can export the iris dataset to a dataframe and print(it. .. GENERATED FROM PYTHON SOURCE LINES 87-90 .. code-block:: default dataframe = iris.export_to_dataframe() print(dataframe) .. rst-class:: sphx-glr-script-out Out: .. code-block:: none parameters labels sepal_length sepal_width petal_length petal_width specy 0 0 0 0 0 0 5.1 3.5 1.4 0.2 0.0 1 4.9 3.0 1.4 0.2 0.0 2 4.7 3.2 1.3 0.2 0.0 3 4.6 3.1 1.5 0.2 0.0 4 5.0 3.6 1.4 0.2 0.0 .. ... ... ... ... ... 145 6.7 3.0 5.2 2.3 2.0 146 6.3 2.5 5.0 1.9 2.0 147 6.5 3.0 5.2 2.0 2.0 148 6.2 3.4 5.4 2.3 2.0 149 5.9 3.0 5.1 1.8 2.0 [150 rows x 5 columns] .. GENERATED FROM PYTHON SOURCE LINES 91-93 We can also easily access the 10 samples previously selected, either globally .. GENERATED FROM PYTHON SOURCE LINES 93-96 .. code-block:: default data = iris.get_all_data(False) print(data[0][shown_samples, :]) .. rst-class:: sphx-glr-script-out Out: .. code-block:: none [[5.8 4. 1.2 0.2 0. ] [5.1 2.5 3. 1.1 1. ] [6.6 3. 4.4 1.4 1. ] [5.4 3.9 1.3 0.4 0. ] [7.9 3.8 6.4 2. 2. ] [6.3 3.3 4.7 1.6 1. ] [6.9 3.1 5.1 2.3 2. ] [5.1 3.8 1.9 0.4 0. ] [4.7 3.2 1.6 0.2 0. ] [6.9 3.2 5.7 2.3 2. ]] .. GENERATED FROM PYTHON SOURCE LINES 97-98 or only the parameters: .. GENERATED FROM PYTHON SOURCE LINES 98-101 .. code-block:: default parameters = iris.get_data_by_group("parameters") print(parameters[shown_samples, :]) .. rst-class:: sphx-glr-script-out Out: .. code-block:: none [[5.8 4. 1.2 0.2] [5.1 2.5 3. 1.1] [6.6 3. 4.4 1.4] [5.4 3.9 1.3 0.4] [7.9 3.8 6.4 2. ] [6.3 3.3 4.7 1.6] [6.9 3.1 5.1 2.3] [5.1 3.8 1.9 0.4] [4.7 3.2 1.6 0.2] [6.9 3.2 5.7 2.3]] .. GENERATED FROM PYTHON SOURCE LINES 102-103 or only the labels: .. GENERATED FROM PYTHON SOURCE LINES 103-106 .. code-block:: default labels = iris.get_data_by_group("labels") print(labels[shown_samples, :]) .. rst-class:: sphx-glr-script-out Out: .. code-block:: none [[0.] [1.] [1.] [0.] [2.] [1.] [2.] [0.] [0.] [2.]] .. GENERATED FROM PYTHON SOURCE LINES 107-111 Plot the dataset ---------------- Lastly, we can plot the dataset in various ways. We will note that the samples are colored according to their labels. .. GENERATED FROM PYTHON SOURCE LINES 113-119 Plot scatter matrix ~~~~~~~~~~~~~~~~~~~ We can use the :class:`.ScatterMatrix` plot where each non-diagonal block represents the samples according to the x- and y- coordinates names while the diagonal ones approximate the probability distributions of the variables, using either an histogram or a kernel-density estimator. .. GENERATED FROM PYTHON SOURCE LINES 119-121 .. code-block:: default iris.plot("ScatterMatrix", classifier="specy", kde=True) .. image:: /examples/dataset/images/sphx_glr_plot_iris_001.png :alt: plot iris :class: sphx-glr-single-img .. rst-class:: sphx-glr-script-out Out: .. code-block:: none .. GENERATED FROM PYTHON SOURCE LINES 122-129 Plot parallel coordinates ~~~~~~~~~~~~~~~~~~~~~~~~~ We can use the :class:`~gemseo.post.dataset.parallel_coordinates.ParallelCoordinates` plot, a.k.a. cowebplot, where each samples is represented by a continuous straight line in pieces whose nodes are indexed by the variables names and measure the variables values. .. GENERATED FROM PYTHON SOURCE LINES 129-131 .. code-block:: default iris.plot("ParallelCoordinates", classifier="specy") .. image:: /examples/dataset/images/sphx_glr_plot_iris_002.png :alt: plot iris :class: sphx-glr-single-img .. rst-class:: sphx-glr-script-out Out: .. code-block:: none .. GENERATED FROM PYTHON SOURCE LINES 132-138 Plot Andrews curves ~~~~~~~~~~~~~~~~~~~ We can use the :class:`.AndrewsCurves` plot which can be viewed as a smooth version of the parallel coordinates. Each sample is represented by a curve and if there is structure in data, it may be visible in the plot. .. GENERATED FROM PYTHON SOURCE LINES 138-140 .. code-block:: default iris.plot("AndrewsCurves", classifier="specy") .. image:: /examples/dataset/images/sphx_glr_plot_iris_003.png :alt: plot iris :class: sphx-glr-single-img .. rst-class:: sphx-glr-script-out Out: .. code-block:: none .. GENERATED FROM PYTHON SOURCE LINES 141-144 Plot Radar ~~~~~~~~~~ We can use the :class:`.Radar` plot .. GENERATED FROM PYTHON SOURCE LINES 144-145 .. code-block:: default iris.plot("Radar", classifier="specy") .. image:: /examples/dataset/images/sphx_glr_plot_iris_004.png :alt: plot iris :class: sphx-glr-single-img .. rst-class:: sphx-glr-script-out Out: .. code-block:: none .. rst-class:: sphx-glr-timing **Total running time of the script:** ( 0 minutes 1.807 seconds) .. _sphx_glr_download_examples_dataset_plot_iris.py: .. only :: html .. container:: sphx-glr-footer :class: sphx-glr-footer-example .. container:: sphx-glr-download sphx-glr-download-python :download:`Download Python source code: plot_iris.py ` .. container:: sphx-glr-download sphx-glr-download-jupyter :download:`Download Jupyter notebook: plot_iris.ipynb ` .. only:: html .. rst-class:: sphx-glr-signature `Gallery generated by Sphinx-Gallery `_