.. DO NOT EDIT. .. THIS FILE WAS AUTOMATICALLY GENERATED BY SPHINX-GALLERY. .. TO MAKE CHANGES, EDIT THE SOURCE PYTHON FILE: .. "examples/dataset/creation/plot_dataset.py" .. LINE NUMBERS ARE GIVEN BELOW. .. only:: html .. note:: :class: sphx-glr-download-link-note :ref:`Go to the end ` to download the full example code .. rst-class:: sphx-glr-example-title .. _sphx_glr_examples_dataset_creation_plot_dataset.py: Dataset ======= In this example, we will see how to build and manipulate a :class:`.Dataset`. From a conceptual point of view, a :class:`.Dataset` is a tabular data structure whose rows are the entries, a.k.a. observations or indices, and whose columns are the features, a.k.a. quantities of interest. These features can be grouped by variable identifier which is a tuple ``(group_name, variable_name)`` and has a dimension equal to the number of components of the variable, a.k.a. dimension. A feature is a tuple ``(group_name, variable_name, component)``. From a software point of view, a :class:`.Dataset` is a particular `pandas DataFrame `__. .. GENERATED FROM PYTHON SOURCE LINES 41-47 .. code-block:: default from __future__ import annotations from gemseo.datasets.dataset import Dataset from numpy import array from pandas import DataFrame .. GENERATED FROM PYTHON SOURCE LINES 48-52 Instantiation ------------- At instantiation, .. GENERATED FROM PYTHON SOURCE LINES 52-54 .. code-block:: default dataset = Dataset() .. GENERATED FROM PYTHON SOURCE LINES 55-56 a dataset has the same name as its class: .. GENERATED FROM PYTHON SOURCE LINES 56-58 .. code-block:: default dataset.name .. rst-class:: sphx-glr-script-out .. code-block:: none 'Dataset' .. GENERATED FROM PYTHON SOURCE LINES 59-60 We can use a more appropriate name at instantiation: .. GENERATED FROM PYTHON SOURCE LINES 60-63 .. code-block:: default dataset_with_custom_name = Dataset(dataset_name="Measurements") dataset_with_custom_name.name .. rst-class:: sphx-glr-script-out .. code-block:: none 'Measurements' .. GENERATED FROM PYTHON SOURCE LINES 64-65 or change it after instantiation: .. GENERATED FROM PYTHON SOURCE LINES 65-68 .. code-block:: default dataset_with_custom_name.name = "simulations" dataset_with_custom_name.name .. rst-class:: sphx-glr-script-out .. code-block:: none 'simulations' .. GENERATED FROM PYTHON SOURCE LINES 69-70 Let us check that the class :class:`.Dataset` derives from ``pandas.DataFrame``: .. GENERATED FROM PYTHON SOURCE LINES 70-72 .. code-block:: default isinstance(dataset, DataFrame) .. rst-class:: sphx-glr-script-out .. code-block:: none True .. GENERATED FROM PYTHON SOURCE LINES 73-78 Add a variable -------------- Then, we can add data by variable name: .. GENERATED FROM PYTHON SOURCE LINES 78-81 .. code-block:: default dataset.add_variable("a", array([[1, 2], [3, 4]])) dataset .. raw:: html
GROUP parameters
VARIABLE a
COMPONENT 0 1
0 1 2
1 3 4


.. GENERATED FROM PYTHON SOURCE LINES 82-87 Note that the columns of the dataset use the multi-level index ``(GROUP, VARIABLE, COMPONENT)``. By default, the variable is placed in the group .. GENERATED FROM PYTHON SOURCE LINES 87-89 .. code-block:: default dataset.DEFAULT_GROUP .. rst-class:: sphx-glr-script-out .. code-block:: none 'parameters' .. GENERATED FROM PYTHON SOURCE LINES 90-91 The attribute ``group_name`` allows to use another group: .. GENERATED FROM PYTHON SOURCE LINES 91-94 .. code-block:: default dataset.add_variable("b", array([[-1, -2, -3], [-4, -5, -6]]), "inputs") dataset .. raw:: html
GROUP parameters inputs
VARIABLE a b
COMPONENT 0 1 0 1 2
0 1 2 -1 -2 -3
1 3 4 -4 -5 -6


.. GENERATED FROM PYTHON SOURCE LINES 95-99 In the same way, for a variable of dimension 2, the components are 0 and 1. We can use other values with the attribute ``components``: .. GENERATED FROM PYTHON SOURCE LINES 99-102 .. code-block:: default dataset.add_variable("c", array([[1.5], [3.5]]), components=[3]) dataset .. raw:: html
GROUP parameters inputs parameters
VARIABLE a b c
COMPONENT 0 1 0 1 2 3
0 1 2 -1 -2 -3 1.5
1 3 4 -4 -5 -6 3.5


.. GENERATED FROM PYTHON SOURCE LINES 103-107 Add a group of variables ------------------------ Note that the data can also be added by group: .. GENERATED FROM PYTHON SOURCE LINES 107-112 .. code-block:: default dataset.add_group( "G1", array([[-1.1, -2.1, -3.1], [-4.1, -5.1, -6.1]]), ["p", "q"], {"p": 2, "q": 1} ) dataset .. raw:: html
GROUP parameters inputs parameters G1
VARIABLE a b c p q
COMPONENT 0 1 0 1 2 3 0 1 0
0 1 2 -1 -2 -3 1.5 -1.1 -2.1 -3.1
1 3 4 -4 -5 -6 3.5 -4.1 -5.1 -6.1


.. GENERATED FROM PYTHON SOURCE LINES 113-115 The dimensions of the variables ``{"p": 2, "q": 1}`` are not mandatory when the number of variable names is equal to the number of columns of the data array: .. GENERATED FROM PYTHON SOURCE LINES 115-118 .. code-block:: default dataset.add_group("G2", array([[1.1, 2.1, 3.1], [4.1, 5.1, 6.1]]), ["x", "y", "z"]) dataset .. raw:: html
GROUP parameters inputs parameters G1 G2
VARIABLE a b c p q x y z
COMPONENT 0 1 0 1 2 3 0 1 0 0 0 0
0 1 2 -1 -2 -3 1.5 -1.1 -2.1 -3.1 1.1 2.1 3.1
1 3 4 -4 -5 -6 3.5 -4.1 -5.1 -6.1 4.1 5.1 6.1


.. GENERATED FROM PYTHON SOURCE LINES 119-124 In the same way, the name of the variable is not mandatory; when missing, ``"x"`` will be considered with a dimension equal to the number of columns of the data array: .. GENERATED FROM PYTHON SOURCE LINES 124-127 .. code-block:: default dataset.add_group("G3", array([[1.2, 2.2], [3.2, 4.2]])) dataset .. raw:: html
GROUP parameters inputs parameters G1 G2 G3
VARIABLE a b c p q x y z x
COMPONENT 0 1 0 1 2 3 0 1 0 0 0 0 0 1
0 1 2 -1 -2 -3 1.5 -1.1 -2.1 -3.1 1.1 2.1 3.1 1.2 2.2
1 3 4 -4 -5 -6 3.5 -4.1 -5.1 -6.1 4.1 5.1 6.1 3.2 4.2


.. GENERATED FROM PYTHON SOURCE LINES 128-133 Convert to a dictionary of arrays --------------------------------- Sometimes, it can be useful to have a dictionary view of the dataset with NumPy arrays as values: .. GENERATED FROM PYTHON SOURCE LINES 133-135 .. code-block:: default dataset.to_dict_of_arrays() .. rst-class:: sphx-glr-script-out .. code-block:: none {'G1': {'p': array([[-1.1, -2.1], [-4.1, -5.1]]), 'q': array([[-3.1], [-6.1]])}, 'G2': {'x': array([[1.1], [4.1]]), 'y': array([[2.1], [5.1]]), 'z': array([[3.1], [6.1]])}, 'G3': {'x': array([[1.2, 2.2], [3.2, 4.2]])}, 'inputs': {'b': array([[-1, -2, -3], [-4, -5, -6]])}, 'parameters': {'a': array([[1, 2], [3, 4]]), 'c': array([[1.5], [3.5]])}} .. GENERATED FROM PYTHON SOURCE LINES 136-137 We can also flatten this dictionary: .. GENERATED FROM PYTHON SOURCE LINES 137-140 .. code-block:: default dataset.to_dict_of_arrays(False) .. rst-class:: sphx-glr-script-out .. code-block:: none {'p': array([[-1.1, -2.1], [-4.1, -5.1]]), 'q': array([[-3.1], [-6.1]]), 'G2:x': array([[1.1], [4.1]]), 'y': array([[2.1], [5.1]]), 'z': array([[3.1], [6.1]]), 'G3:x': array([[1.2, 2.2], [3.2, 4.2]]), 'b': array([[-1, -2, -3], [-4, -5, -6]]), 'a': array([[1, 2], [3, 4]]), 'c': array([[1.5], [3.5]])} .. GENERATED FROM PYTHON SOURCE LINES 141-149 Get information --------------- Some properties ~~~~~~~~~~~~~~~ At any time, we can access to the names of the groups of variables: .. GENERATED FROM PYTHON SOURCE LINES 149-151 .. code-block:: default dataset.group_names .. rst-class:: sphx-glr-script-out .. code-block:: none ['G1', 'G2', 'G3', 'inputs', 'parameters'] .. GENERATED FROM PYTHON SOURCE LINES 152-153 and to the total number of components per group: .. GENERATED FROM PYTHON SOURCE LINES 153-155 .. code-block:: default dataset.group_names_to_n_components .. rst-class:: sphx-glr-script-out .. code-block:: none {'G1': 3, 'G2': 3, 'G3': 2, 'inputs': 3, 'parameters': 3} .. GENERATED FROM PYTHON SOURCE LINES 156-159 Concerning the variables, note that we can use the same variable name in two different groups. The (unique) variable names can be accessed with .. GENERATED FROM PYTHON SOURCE LINES 159-161 .. code-block:: default dataset.variable_names .. rst-class:: sphx-glr-script-out .. code-block:: none ['a', 'b', 'c', 'p', 'q', 'x', 'y', 'z'] .. GENERATED FROM PYTHON SOURCE LINES 162-163 while the total number of components per variable name can be accessed with .. GENERATED FROM PYTHON SOURCE LINES 163-165 .. code-block:: default dataset.variable_names_to_n_components .. rst-class:: sphx-glr-script-out .. code-block:: none {'a': 2, 'b': 3, 'c': 1, 'p': 2, 'q': 1, 'x': 3, 'y': 1, 'z': 1} .. GENERATED FROM PYTHON SOURCE LINES 166-168 Lastly, the variable identifiers ``(group_name, variable_name)`` can be accessed with .. GENERATED FROM PYTHON SOURCE LINES 168-170 .. code-block:: default dataset.variable_identifiers .. rst-class:: sphx-glr-script-out .. code-block:: none [('G1', 'p'), ('G1', 'q'), ('G2', 'x'), ('G2', 'y'), ('G2', 'z'), ('G3', 'x'), ('inputs', 'b'), ('parameters', 'a'), ('parameters', 'c')] .. GENERATED FROM PYTHON SOURCE LINES 171-175 Some getters ~~~~~~~~~~~~ We can also easily access to the group of a variable: .. GENERATED FROM PYTHON SOURCE LINES 175-177 .. code-block:: default dataset.get_group_names("x") .. rst-class:: sphx-glr-script-out .. code-block:: none ['G2', 'G3'] .. GENERATED FROM PYTHON SOURCE LINES 178-179 and to the names of the variables included in a group: .. GENERATED FROM PYTHON SOURCE LINES 179-181 .. code-block:: default dataset.get_variable_names("G1") .. rst-class:: sphx-glr-script-out .. code-block:: none ['p', 'q'] .. GENERATED FROM PYTHON SOURCE LINES 182-183 The components of a variable located in a group can be accessed with .. GENERATED FROM PYTHON SOURCE LINES 183-185 .. code-block:: default dataset.get_variable_components("G2", "y") .. rst-class:: sphx-glr-script-out .. code-block:: none [0] .. GENERATED FROM PYTHON SOURCE LINES 186-188 Lastly, the columns of the dataset have string representations: .. GENERATED FROM PYTHON SOURCE LINES 188-190 .. code-block:: default dataset.get_columns() .. rst-class:: sphx-glr-script-out .. code-block:: none ['a[0]', 'a[1]', 'b[0]', 'b[1]', 'b[2]', 'c', 'p[0]', 'p[1]', 'q', 'x', 'y', 'z', 'x[0]', 'x[1]'] .. GENERATED FROM PYTHON SOURCE LINES 191-192 that can be split into tuples: .. GENERATED FROM PYTHON SOURCE LINES 192-194 .. code-block:: default dataset.get_columns(as_tuple=True) .. rst-class:: sphx-glr-script-out .. code-block:: none [('parameters', 'a', 0), ('parameters', 'a', 1), ('inputs', 'b', 0), ('inputs', 'b', 1), ('inputs', 'b', 2), ('parameters', 'c', 3), ('G1', 'p', 0), ('G1', 'p', 1), ('G1', 'q', 0), ('G2', 'x', 0), ('G2', 'y', 0), ('G2', 'z', 0), ('G3', 'x', 0), ('G3', 'x', 1)] .. GENERATED FROM PYTHON SOURCE LINES 195-196 We can also consider a subset of the columns: .. GENERATED FROM PYTHON SOURCE LINES 196-198 .. code-block:: default dataset.get_columns(["c", "y"]) .. rst-class:: sphx-glr-script-out .. code-block:: none ['c', 'y'] .. GENERATED FROM PYTHON SOURCE LINES 199-202 Renaming -------- It is quite easy to rename a group: .. GENERATED FROM PYTHON SOURCE LINES 202-205 .. code-block:: default dataset.rename_group("G1", "foo") dataset.group_names .. rst-class:: sphx-glr-script-out .. code-block:: none ['G2', 'G3', 'foo', 'inputs', 'parameters'] .. GENERATED FROM PYTHON SOURCE LINES 206-207 or a variable: .. GENERATED FROM PYTHON SOURCE LINES 207-211 .. code-block:: default dataset.rename_variable("x", "bar", "G2") dataset.rename_variable("y", "baz") dataset.variable_names .. rst-class:: sphx-glr-script-out .. code-block:: none ['a', 'b', 'bar', 'baz', 'c', 'p', 'q', 'x', 'z'] .. GENERATED FROM PYTHON SOURCE LINES 212-215 Note that the group name ``"G2"`` allows to rename ``"x"`` only in ``"G2"``; without this information, the method would have renamed ``"x"`` in both ``"G2"`` and ``"G3"``. .. GENERATED FROM PYTHON SOURCE LINES 217-222 Transformation to a variable ---------------------------- One can use a function applying to a NumPy array to transform the data associated with a variable, for instance a twofold increase: .. GENERATED FROM PYTHON SOURCE LINES 222-224 .. code-block:: default dataset.transform_data(lambda x: 2 * x, variable_names="bar") .. GENERATED FROM PYTHON SOURCE LINES 225-231 Get a view of the dataset ------------------------- The method :meth:`~.Dataset.get_view` returns a view of the dataset by using masks built from variable names, group names, components and row indices. For instance, we can get a view of the variables ``"b"`` and ``"x"``: .. GENERATED FROM PYTHON SOURCE LINES 231-233 .. code-block:: default dataset.get_view(variable_names=["b", "x"]) .. raw:: html
GROUP inputs G3
VARIABLE b x
COMPONENT 0 1 2 0 1
0 -1 -2 -3 1.2 2.2
1 -4 -5 -6 3.2 4.2


.. GENERATED FROM PYTHON SOURCE LINES 234-235 or a view of the group ``"inputs"``: .. GENERATED FROM PYTHON SOURCE LINES 235-237 .. code-block:: default dataset.get_view("inputs") .. raw:: html
GROUP inputs
VARIABLE b
COMPONENT 0 1 2
0 -1 -2 -3
1 -4 -5 -6


.. GENERATED FROM PYTHON SOURCE LINES 238-239 We can also combine the keys: .. GENERATED FROM PYTHON SOURCE LINES 239-241 .. code-block:: default dataset.get_view(variable_names=["b", "x"], components=[0]) .. raw:: html
GROUP inputs G3
VARIABLE b x
COMPONENT 0 0
0 -1 1.2
1 -4 3.2


.. GENERATED FROM PYTHON SOURCE LINES 242-247 Update some data ---------------- To complete this example, we can update the data by using masks built from variable names, group names, components and row indices: .. GENERATED FROM PYTHON SOURCE LINES 247-249 .. code-block:: default dataset.update_data([[10, 10, 10]], "inputs", indices=[1]) dataset .. raw:: html
GROUP parameters inputs parameters foo G2 G3
VARIABLE a b c p q bar baz z x
COMPONENT 0 1 0 1 2 3 0 1 0 0 0 0 0 1
0 1 2 -1 -2 -3 1.5 -1.1 -2.1 -3.1 2.2 2.1 3.1 1.2 2.2
1 3 4 10 10 10 3.5 -4.1 -5.1 -6.1 8.2 5.1 6.1 3.2 4.2


.. rst-class:: sphx-glr-timing **Total running time of the script:** ( 0 minutes 0.106 seconds) .. _sphx_glr_download_examples_dataset_creation_plot_dataset.py: .. only:: html .. container:: sphx-glr-footer sphx-glr-footer-example .. container:: sphx-glr-download sphx-glr-download-python :download:`Download Python source code: plot_dataset.py ` .. container:: sphx-glr-download sphx-glr-download-jupyter :download:`Download Jupyter notebook: plot_dataset.ipynb ` .. only:: html .. rst-class:: sphx-glr-signature `Gallery generated by Sphinx-Gallery `_