.. DO NOT EDIT. .. THIS FILE WAS AUTOMATICALLY GENERATED BY SPHINX-GALLERY. .. TO MAKE CHANGES, EDIT THE SOURCE PYTHON FILE: .. "examples/dataset/plot_dataset_from_array.py" .. LINE NUMBERS ARE GIVEN BELOW. .. only:: html .. note:: :class: sphx-glr-download-link-note Click :ref:`here ` to download the full example code .. rst-class:: sphx-glr-example-title .. _sphx_glr_examples_dataset_plot_dataset_from_array.py: Dataset from a numpy array ========================== In this example, we will see how to build a :class:`.Dataset` from an numpy array. For that, we need to import this :class:`.Dataset` class: .. GENERATED FROM PYTHON SOURCE LINES 28-36 .. code-block:: default from gemseo.api import configure_logger from gemseo.core.dataset import Dataset from numpy import concatenate from numpy.random import rand configure_logger() .. rst-class:: sphx-glr-script-out Out: .. code-block:: none .. GENERATED FROM PYTHON SOURCE LINES 37-44 Synthetic data -------------- Let us consider three parameters: - x_1 with dimension 1, - x_2 with dimension 2, - y_1 with dimension 3. .. GENERATED FROM PYTHON SOURCE LINES 44-50 .. code-block:: default dim_x1 = 1 dim_x2 = 2 dim_y1 = 3 sizes = {"x_1": dim_x1, "x_2": dim_x2, "y_1": dim_y1} groups = {"x_1": "inputs", "x_2": "inputs", "y_1": "outputs"} .. GENERATED FROM PYTHON SOURCE LINES 51-57 We generate 5 random samples of the inputs where: - x_1 is stored in the first column, - x_2 is stored in the 2nd and 3rd columns and 5 random samples of the outputs. .. GENERATED FROM PYTHON SOURCE LINES 57-66 .. code-block:: default n_samples = 5 inputs = rand(n_samples, dim_x1 + dim_x2) inputs_names = ["x_1", "x_2"] outputs = rand(n_samples, dim_y1) outputs_names = ["y_1"] data = concatenate((inputs, outputs), 1) data_names = inputs_names + outputs_names .. GENERATED FROM PYTHON SOURCE LINES 67-72 Create a dataset ---------------- using default names ~~~~~~~~~~~~~~~~~~~ We build a :class:`.Dataset` and initialize from the whole data: .. GENERATED FROM PYTHON SOURCE LINES 72-77 .. code-block:: default dataset = Dataset(name="random_dataset") dataset.set_from_array(data) print(dataset) .. rst-class:: sphx-glr-script-out Out: .. code-block:: none random_dataset Number of samples: 5 Number of variables: 6 Variables names and sizes by group: parameters: x_0 (1), x_1 (1), x_2 (1), x_3 (1), x_4 (1), x_5 (1) Number of dimensions (total = 6) by group: parameters: 6 .. GENERATED FROM PYTHON SOURCE LINES 78-82 using particular names ~~~~~~~~~~~~~~~~~~~~~~ We can also use the names of the variables, rather than the default ones fixed by the class: .. GENERATED FROM PYTHON SOURCE LINES 82-87 .. code-block:: default dataset = Dataset(name="random_dataset") dataset.set_from_array(data, data_names, sizes) print(dataset) print(dataset.data) .. rst-class:: sphx-glr-script-out Out: .. code-block:: none random_dataset Number of samples: 5 Number of variables: 3 Variables names and sizes by group: parameters: x_1 (1), x_2 (2), y_1 (3) Number of dimensions (total = 6) by group: parameters: 6 {'parameters': array([[4.41451961e-01, 6.30061490e-01, 8.43647520e-01, 4.24697706e-01, 6.69638526e-02, 4.57962045e-01], [9.33783172e-02, 1.19158168e-01, 8.17784653e-01, 4.11008784e-01, 5.49457252e-01, 6.39829154e-01], [4.41969474e-01, 8.02869388e-01, 1.50381081e-04, 5.89044704e-02, 6.17529433e-01, 1.32273670e-01], [1.15028080e-01, 3.68343264e-01, 3.97657201e-01, 4.31813184e-01, 8.92330173e-01, 1.33836528e-01], [8.33757881e-01, 5.09782498e-01, 9.17964534e-01, 2.08759507e-01, 4.23018763e-01, 6.03938712e-02]])} .. GENERATED FROM PYTHON SOURCE LINES 88-94 .. warning:: The number of variables names must be equal to the number of columns of the data array. Otherwise, the user has to specify the sizes of the different variables by means of a dictionary and be careful that the total size is equal to this number of columns. .. GENERATED FROM PYTHON SOURCE LINES 96-99 using particular groups ~~~~~~~~~~~~~~~~~~~~~~~ We can also use the notions of groups of variables: .. GENERATED FROM PYTHON SOURCE LINES 99-104 .. code-block:: default dataset = Dataset(name="random_dataset") dataset.set_from_array(data, data_names, sizes, groups) print(dataset) print(dataset.data) .. rst-class:: sphx-glr-script-out Out: .. code-block:: none random_dataset Number of samples: 5 Number of variables: 3 Variables names and sizes by group: inputs: x_1 (1), x_2 (2) outputs: y_1 (3) Number of dimensions (total = 6) by group: inputs: 3 outputs: 3 {'inputs': array([[4.41451961e-01, 6.30061490e-01, 8.43647520e-01], [9.33783172e-02, 1.19158168e-01, 8.17784653e-01], [4.41969474e-01, 8.02869388e-01, 1.50381081e-04], [1.15028080e-01, 3.68343264e-01, 3.97657201e-01], [8.33757881e-01, 5.09782498e-01, 9.17964534e-01]]), 'outputs': array([[0.42469771, 0.06696385, 0.45796204], [0.41100878, 0.54945725, 0.63982915], [0.05890447, 0.61752943, 0.13227367], [0.43181318, 0.89233017, 0.13383653], [0.20875951, 0.42301876, 0.06039387]])} .. GENERATED FROM PYTHON SOURCE LINES 105-110 .. note:: The groups are specified by means of a dictionary where indices are the variables names and values are the groups. If a variable is missing, the default group 'parameters' is considered. .. GENERATED FROM PYTHON SOURCE LINES 112-115 storing by names ~~~~~~~~~~~~~~~~ We can also store the data by variables names rather than by groups. .. GENERATED FROM PYTHON SOURCE LINES 115-120 .. code-block:: default dataset = Dataset(name="random_dataset", by_group=False) dataset.set_from_array(data, data_names, sizes, groups) print(dataset) print(dataset.data) .. rst-class:: sphx-glr-script-out Out: .. code-block:: none random_dataset Number of samples: 5 Number of variables: 3 Variables names and sizes by group: inputs: x_1 (1), x_2 (2) outputs: y_1 (3) Number of dimensions (total = 6) by group: inputs: 3 outputs: 3 {'x_1': array([[0.44145196], [0.09337832], [0.44196947], [0.11502808], [0.83375788]]), 'x_2': array([[6.30061490e-01, 8.43647520e-01], [1.19158168e-01, 8.17784653e-01], [8.02869388e-01, 1.50381081e-04], [3.68343264e-01, 3.97657201e-01], [5.09782498e-01, 9.17964534e-01]]), 'y_1': array([[0.42469771, 0.06696385, 0.45796204], [0.41100878, 0.54945725, 0.63982915], [0.05890447, 0.61752943, 0.13227367], [0.43181318, 0.89233017, 0.13383653], [0.20875951, 0.42301876, 0.06039387]])} .. GENERATED FROM PYTHON SOURCE LINES 121-132 .. note:: The choice to be made between a storage by group and a storage by variables names aims to limit the number of memory copies of numpy arrays. It mainly depends on how the dataset is used and for what purposes. For example, if we want to build a machine learning algorithm from both input and output data, we only have to access the data by group and in this case, storing the data by group is recommended. Conversely, if we want to use the dataset for post-processing purposes, by accessing the variables of the dataset from their names, the storage by variables names is preferable. .. GENERATED FROM PYTHON SOURCE LINES 134-139 Access properties ----------------- Variables names ~~~~~~~~~~~~~~~ We can access the variables names: .. GENERATED FROM PYTHON SOURCE LINES 139-141 .. code-block:: default print(dataset.variables) .. rst-class:: sphx-glr-script-out Out: .. code-block:: none ['x_1', 'x_2', 'y_1'] .. GENERATED FROM PYTHON SOURCE LINES 142-145 Variables sizes ~~~~~~~~~~~~~~~ We can access the variables sizes: .. GENERATED FROM PYTHON SOURCE LINES 145-147 .. code-block:: default print(dataset.sizes) .. rst-class:: sphx-glr-script-out Out: .. code-block:: none {'x_1': 1, 'x_2': 2, 'y_1': 3} .. GENERATED FROM PYTHON SOURCE LINES 148-151 Variables groups ~~~~~~~~~~~~~~~~ We can access the variables groups: .. GENERATED FROM PYTHON SOURCE LINES 151-153 .. code-block:: default print(dataset.groups) .. rst-class:: sphx-glr-script-out Out: .. code-block:: none ['inputs', 'outputs'] .. GENERATED FROM PYTHON SOURCE LINES 154-159 Access data ----------- Access by group ~~~~~~~~~~~~~~~ We can get the data by group, either as an array (default option): .. GENERATED FROM PYTHON SOURCE LINES 159-160 .. code-block:: default print(dataset.get_data_by_group("inputs")) .. rst-class:: sphx-glr-script-out Out: .. code-block:: none [[4.41451961e-01 6.30061490e-01 8.43647520e-01] [9.33783172e-02 1.19158168e-01 8.17784653e-01] [4.41969474e-01 8.02869388e-01 1.50381081e-04] [1.15028080e-01 3.68343264e-01 3.97657201e-01] [8.33757881e-01 5.09782498e-01 9.17964534e-01]] .. GENERATED FROM PYTHON SOURCE LINES 161-162 or as a dictionary indexed by the variables names: .. GENERATED FROM PYTHON SOURCE LINES 162-164 .. code-block:: default print(dataset.get_data_by_group("inputs", True)) .. rst-class:: sphx-glr-script-out Out: .. code-block:: none {'x_1': array([[0.44145196], [0.09337832], [0.44196947], [0.11502808], [0.83375788]]), 'x_2': array([[6.30061490e-01, 8.43647520e-01], [1.19158168e-01, 8.17784653e-01], [8.02869388e-01, 1.50381081e-04], [3.68343264e-01, 3.97657201e-01], [5.09782498e-01, 9.17964534e-01]])} .. GENERATED FROM PYTHON SOURCE LINES 165-169 Access by variable name ~~~~~~~~~~~~~~~~~~~~~~~ We can get the data by variables names, either as a dictionary indexed by the variables names (default option): .. GENERATED FROM PYTHON SOURCE LINES 169-170 .. code-block:: default print(dataset.get_data_by_names(["x_1", "y_1"])) .. rst-class:: sphx-glr-script-out Out: .. code-block:: none {'x_1': array([[0.44145196], [0.09337832], [0.44196947], [0.11502808], [0.83375788]]), 'y_1': array([[0.42469771, 0.06696385, 0.45796204], [0.41100878, 0.54945725, 0.63982915], [0.05890447, 0.61752943, 0.13227367], [0.43181318, 0.89233017, 0.13383653], [0.20875951, 0.42301876, 0.06039387]])} .. GENERATED FROM PYTHON SOURCE LINES 171-172 or as an array: .. GENERATED FROM PYTHON SOURCE LINES 172-174 .. code-block:: default print(dataset.get_data_by_names(["x_1", "y_1"], False)) .. rst-class:: sphx-glr-script-out Out: .. code-block:: none [[0.44145196 0.42469771 0.06696385 0.45796204] [0.09337832 0.41100878 0.54945725 0.63982915] [0.44196947 0.05890447 0.61752943 0.13227367] [0.11502808 0.43181318 0.89233017 0.13383653] [0.83375788 0.20875951 0.42301876 0.06039387]] .. GENERATED FROM PYTHON SOURCE LINES 175-178 Access all data ~~~~~~~~~~~~~~~ We can get all the data, either as a large array: .. GENERATED FROM PYTHON SOURCE LINES 178-179 .. code-block:: default print(dataset.get_all_data()) .. rst-class:: sphx-glr-script-out Out: .. code-block:: none ({'inputs': array([[4.41451961e-01, 6.30061490e-01, 8.43647520e-01], [9.33783172e-02, 1.19158168e-01, 8.17784653e-01], [4.41969474e-01, 8.02869388e-01, 1.50381081e-04], [1.15028080e-01, 3.68343264e-01, 3.97657201e-01], [8.33757881e-01, 5.09782498e-01, 9.17964534e-01]]), 'outputs': array([[0.42469771, 0.06696385, 0.45796204], [0.41100878, 0.54945725, 0.63982915], [0.05890447, 0.61752943, 0.13227367], [0.43181318, 0.89233017, 0.13383653], [0.20875951, 0.42301876, 0.06039387]])}, {'inputs': ['x_1', 'x_2'], 'outputs': ['y_1']}, {'x_1': 1, 'x_2': 2, 'y_1': 3}) .. GENERATED FROM PYTHON SOURCE LINES 180-181 or as a dictionary indexed by variables names: .. GENERATED FROM PYTHON SOURCE LINES 181-182 .. code-block:: default print(dataset.get_all_data(as_dict=True)) .. rst-class:: sphx-glr-script-out Out: .. code-block:: none {'inputs': {'x_1': array([[0.44145196], [0.09337832], [0.44196947], [0.11502808], [0.83375788]]), 'x_2': array([[6.30061490e-01, 8.43647520e-01], [1.19158168e-01, 8.17784653e-01], [8.02869388e-01, 1.50381081e-04], [3.68343264e-01, 3.97657201e-01], [5.09782498e-01, 9.17964534e-01]])}, 'outputs': {'y_1': array([[0.42469771, 0.06696385, 0.45796204], [0.41100878, 0.54945725, 0.63982915], [0.05890447, 0.61752943, 0.13227367], [0.43181318, 0.89233017, 0.13383653], [0.20875951, 0.42301876, 0.06039387]])}} .. GENERATED FROM PYTHON SOURCE LINES 183-185 We can get these data sorted by category, either with a large array for each category: .. GENERATED FROM PYTHON SOURCE LINES 185-186 .. code-block:: default print(dataset.get_all_data(by_group=False)) .. rst-class:: sphx-glr-script-out Out: .. code-block:: none (array([[4.41451961e-01, 6.30061490e-01, 8.43647520e-01, 4.24697706e-01, 6.69638526e-02, 4.57962045e-01], [9.33783172e-02, 1.19158168e-01, 8.17784653e-01, 4.11008784e-01, 5.49457252e-01, 6.39829154e-01], [4.41969474e-01, 8.02869388e-01, 1.50381081e-04, 5.89044704e-02, 6.17529433e-01, 1.32273670e-01], [1.15028080e-01, 3.68343264e-01, 3.97657201e-01, 4.31813184e-01, 8.92330173e-01, 1.33836528e-01], [8.33757881e-01, 5.09782498e-01, 9.17964534e-01, 2.08759507e-01, 4.23018763e-01, 6.03938712e-02]]), ['x_1', 'x_2', 'y_1'], {'x_1': 1, 'x_2': 2, 'y_1': 3}) .. GENERATED FROM PYTHON SOURCE LINES 187-188 or with a dictionary of variables names: .. GENERATED FROM PYTHON SOURCE LINES 188-189 .. code-block:: default print(dataset.get_all_data(by_group=False, as_dict=True)) .. rst-class:: sphx-glr-script-out Out: .. code-block:: none {'x_1': array([[0.44145196], [0.09337832], [0.44196947], [0.11502808], [0.83375788]]), 'x_2': array([[6.30061490e-01, 8.43647520e-01], [1.19158168e-01, 8.17784653e-01], [8.02869388e-01, 1.50381081e-04], [3.68343264e-01, 3.97657201e-01], [5.09782498e-01, 9.17964534e-01]]), 'y_1': array([[0.42469771, 0.06696385, 0.45796204], [0.41100878, 0.54945725, 0.63982915], [0.05890447, 0.61752943, 0.13227367], [0.43181318, 0.89233017, 0.13383653], [0.20875951, 0.42301876, 0.06039387]])} .. rst-class:: sphx-glr-timing **Total running time of the script:** ( 0 minutes 0.017 seconds) .. _sphx_glr_download_examples_dataset_plot_dataset_from_array.py: .. only :: html .. container:: sphx-glr-footer :class: sphx-glr-footer-example .. container:: sphx-glr-download sphx-glr-download-python :download:`Download Python source code: plot_dataset_from_array.py ` .. container:: sphx-glr-download sphx-glr-download-jupyter :download:`Download Jupyter notebook: plot_dataset_from_array.ipynb ` .. only:: html .. rst-class:: sphx-glr-signature `Gallery generated by Sphinx-Gallery `_