.. DO NOT EDIT. .. THIS FILE WAS AUTOMATICALLY GENERATED BY SPHINX-GALLERY. .. TO MAKE CHANGES, EDIT THE SOURCE PYTHON FILE: .. "examples/dataset/use_cases/plot_iris.py" .. LINE NUMBERS ARE GIVEN BELOW. .. only:: html .. note:: :class: sphx-glr-download-link-note :ref:`Go to the end ` to download the full example code. .. rst-class:: sphx-glr-example-title .. _sphx_glr_examples_dataset_use_cases_plot_iris.py: Iris dataset ============ Presentation ------------ This is one of the best known dataset to be found in the machine learning literature. It was introduced by the statistician Ronald Fisher in his 1936 paper "The use of multiple measurements in taxonomic problems", Annals of Eugenics. 7 (2): 179-188. It contains 150 instances of iris plants: - 50 Iris Setosa, - 50 Iris Versicolour, - 50 Iris Virginica. Each instance is characterized by: - its sepal length in cm, - its sepal width in cm, - its petal length in cm, - its petal width in cm. This dataset can be used for either clustering purposes or classification ones. .. GENERATED FROM PYTHON SOURCE LINES 52-68 .. code-block:: Python from __future__ import annotations from numpy.random import default_rng from gemseo import configure_logger from gemseo import create_benchmark_dataset from gemseo.post.dataset.andrews_curves import AndrewsCurves from gemseo.post.dataset.parallel_coordinates import ParallelCoordinates from gemseo.post.dataset.radviz import Radar from gemseo.post.dataset.scatter_plot_matrix import ScatterMatrix configure_logger() rng = default_rng(1) .. GENERATED FROM PYTHON SOURCE LINES 69-73 Load Iris dataset ----------------- We can easily load this dataset by means of the high-level function :func:`~gemseo.create_benchmark_dataset`: .. GENERATED FROM PYTHON SOURCE LINES 73-76 .. code-block:: Python iris = create_benchmark_dataset("IrisDataset") .. GENERATED FROM PYTHON SOURCE LINES 77-78 and get some information about it .. GENERATED FROM PYTHON SOURCE LINES 78-80 .. code-block:: Python iris .. raw:: html
GROUP parameters labels
VARIABLE sepal_length sepal_width petal_length petal_width specy
COMPONENT 0 0 0 0 0
0 5.1 3.5 1.4 0.2 0
1 4.9 3.0 1.4 0.2 0
2 4.7 3.2 1.3 0.2 0
3 4.6 3.1 1.5 0.2 0
4 5.0 3.6 1.4 0.2 0
... ... ... ... ... ...
145 6.7 3.0 5.2 2.3 2
146 6.3 2.5 5.0 1.9 2
147 6.5 3.0 5.2 2.0 2
148 6.2 3.4 5.4 2.3 2
149 5.9 3.0 5.1 1.8 2

150 rows × 5 columns



.. GENERATED FROM PYTHON SOURCE LINES 81-84 Manipulate the dataset ---------------------- We randomly select 10 samples to display. .. GENERATED FROM PYTHON SOURCE LINES 84-86 .. code-block:: Python samples = rng.choice(len(iris), size=10, replace=False) .. GENERATED FROM PYTHON SOURCE LINES 87-89 We can easily access the 10 samples previously selected, either globally .. GENERATED FROM PYTHON SOURCE LINES 89-91 .. code-block:: Python data = iris.get_view(indices=samples) .. GENERATED FROM PYTHON SOURCE LINES 92-93 or only the parameters: .. GENERATED FROM PYTHON SOURCE LINES 93-95 .. code-block:: Python iris.get_view(group_names=iris.PARAMETER_GROUP, indices=samples) .. raw:: html
GROUP parameters
VARIABLE sepal_length sepal_width petal_length petal_width
COMPONENT 0 0 0 0
46 5.1 3.8 1.6 0.2
66 5.6 3.0 4.5 1.5
120 6.9 3.2 5.7 2.3
5 5.4 3.9 1.7 0.4
140 6.7 3.1 5.6 2.4
72 6.3 2.5 4.9 1.5
21 5.1 3.7 1.5 0.4
107 7.3 2.9 6.3 1.8
136 6.3 3.4 5.6 2.4
37 4.9 3.6 1.4 0.1


.. GENERATED FROM PYTHON SOURCE LINES 96-97 or only the labels: .. GENERATED FROM PYTHON SOURCE LINES 97-99 .. code-block:: Python iris.get_view(group_names="labels", indices=samples) .. raw:: html
GROUP labels
VARIABLE specy
COMPONENT 0
46 0
66 1
120 2
5 0
140 2
72 1
21 0
107 2
136 2
37 0


.. GENERATED FROM PYTHON SOURCE LINES 100-104 Plot the dataset ---------------- Lastly, we can plot the dataset in various ways. We will note that the samples are colored according to their labels. .. GENERATED FROM PYTHON SOURCE LINES 106-112 Plot scatter matrix ~~~~~~~~~~~~~~~~~~~ We can use the :class:`.ScatterMatrix` plot where each non-diagonal block represents the samples according to the x- and y- coordinates names while the diagonal ones approximate the probability distributions of the variables, using either an histogram or a kernel-density estimator. .. GENERATED FROM PYTHON SOURCE LINES 112-114 .. code-block:: Python ScatterMatrix(iris, classifier="specy", kde=True).execute(save=False, show=True) .. image-sg:: /examples/dataset/use_cases/images/sphx_glr_plot_iris_001.png :alt: plot iris :srcset: /examples/dataset/use_cases/images/sphx_glr_plot_iris_001.png :class: sphx-glr-single-img .. rst-class:: sphx-glr-script-out .. code-block:: none [
] .. GENERATED FROM PYTHON SOURCE LINES 115-122 Plot parallel coordinates ~~~~~~~~~~~~~~~~~~~~~~~~~ We can use the :class:`~gemseo.post.dataset.parallel_coordinates.ParallelCoordinates` plot, a.k.a. cowebplot, where each samples is represented by a continuous straight line in pieces whose nodes are indexed by the variables names and measure the variables values. .. GENERATED FROM PYTHON SOURCE LINES 122-124 .. code-block:: Python ParallelCoordinates(iris, "specy").execute(save=False, show=True) .. image-sg:: /examples/dataset/use_cases/images/sphx_glr_plot_iris_002.png :alt: plot iris :srcset: /examples/dataset/use_cases/images/sphx_glr_plot_iris_002.png :class: sphx-glr-single-img .. rst-class:: sphx-glr-script-out .. code-block:: none [
] .. GENERATED FROM PYTHON SOURCE LINES 125-131 Plot Andrews curves ~~~~~~~~~~~~~~~~~~~ We can use the :class:`.AndrewsCurves` plot which can be viewed as a smooth version of the parallel coordinates. Each sample is represented by a curve and if there is structure in data, it may be visible in the plot. .. GENERATED FROM PYTHON SOURCE LINES 131-133 .. code-block:: Python AndrewsCurves(iris, "specy").execute(save=False, show=True) .. image-sg:: /examples/dataset/use_cases/images/sphx_glr_plot_iris_003.png :alt: plot iris :srcset: /examples/dataset/use_cases/images/sphx_glr_plot_iris_003.png :class: sphx-glr-single-img .. rst-class:: sphx-glr-script-out .. code-block:: none [
] .. GENERATED FROM PYTHON SOURCE LINES 134-137 Plot Radar ~~~~~~~~~~ We can use the :class:`.Radar` plot .. GENERATED FROM PYTHON SOURCE LINES 137-138 .. code-block:: Python Radar(iris, "specy").execute(save=False, show=True) .. image-sg:: /examples/dataset/use_cases/images/sphx_glr_plot_iris_004.png :alt: plot iris :srcset: /examples/dataset/use_cases/images/sphx_glr_plot_iris_004.png :class: sphx-glr-single-img .. rst-class:: sphx-glr-script-out .. code-block:: none [
] .. rst-class:: sphx-glr-timing **Total running time of the script:** (0 minutes 1.797 seconds) .. _sphx_glr_download_examples_dataset_use_cases_plot_iris.py: .. only:: html .. container:: sphx-glr-footer sphx-glr-footer-example .. container:: sphx-glr-download sphx-glr-download-jupyter :download:`Download Jupyter notebook: plot_iris.ipynb ` .. container:: sphx-glr-download sphx-glr-download-python :download:`Download Python source code: plot_iris.py ` .. container:: sphx-glr-download sphx-glr-download-zip :download:`Download zipped: plot_iris.zip ` .. only:: html .. rst-class:: sphx-glr-signature `Gallery generated by Sphinx-Gallery `_