Dataset from a NumPy array

In this example, we will see how to build a Dataset from an NumPy array.

from __future__ import annotations

from gemseo import configure_logger
from gemseo.datasets.dataset import Dataset
from numpy import concatenate
from numpy.random import rand

configure_logger()
<RootLogger root (INFO)>

Let us consider three parameters \(x_1\), \(x_2\) and \(x_3\) of size 1, 2 and 3 respectively. We generate 5 random samples of the inputs where

  • x_1 is stored in the first column,

  • x_2 is stored in the 2nd and 3rd columns

and 5 random samples of the outputs:

n_samples = 5
inputs = rand(n_samples, 3)
outputs = rand(n_samples, 3)
data = concatenate((inputs, outputs), 1)

A dataset with default names

We create a Dataset from the NumPy array only and let GEMSEO give default names to its columns:

dataset = Dataset.from_array(data)
print(dataset)
GROUP     parameters
VARIABLE         x_0       x_1       x_2       x_3       x_4       x_5
COMPONENT          0         0         0         0         0         0
0           0.037402  0.243962  0.019047  0.310880  0.021208  0.408196
1           0.051136  0.880847  0.411467  0.217063  0.603613  0.293174
2           0.969743  0.850041  0.713062  0.945196  0.110868  0.493227
3           0.218725  0.721572  0.631043  0.437936  0.655254  0.884586
4           0.019469  0.733367  0.902215  0.705865  0.491662  0.039995

A dataset with custom names

We can also pass the names and sizes of the variables:

names_to_sizes = {"x_1": 1, "x_2": 2, "y_1": 3}
dataset = Dataset.from_array(data, ["x_1", "x_2", "y_1"], names_to_sizes)
print(dataset)
GROUP     parameters
VARIABLE         x_1       x_2                 y_1
COMPONENT          0         0         1         0         1         2
0           0.037402  0.243962  0.019047  0.310880  0.021208  0.408196
1           0.051136  0.880847  0.411467  0.217063  0.603613  0.293174
2           0.969743  0.850041  0.713062  0.945196  0.110868  0.493227
3           0.218725  0.721572  0.631043  0.437936  0.655254  0.884586
4           0.019469  0.733367  0.902215  0.705865  0.491662  0.039995

Warning

The number of variables names must be equal to the number of columns of the data array. Otherwise, the user has to specify the sizes of the different variables by means of a dictionary and be careful that the total size is equal to this number of columns.

A dataset with custom groups

We can also use the notions of groups of variables:

groups = {"x_1": "inputs", "x_2": "inputs", "y_1": "outputs"}
dataset = Dataset.from_array(data, ["x_1", "x_2", "y_1"], names_to_sizes, groups)
print(dataset)
GROUP        inputs                       outputs
VARIABLE        x_1       x_2                 y_1
COMPONENT         0         0         1         0         1         2
0          0.037402  0.243962  0.019047  0.310880  0.021208  0.408196
1          0.051136  0.880847  0.411467  0.217063  0.603613  0.293174
2          0.969743  0.850041  0.713062  0.945196  0.110868  0.493227
3          0.218725  0.721572  0.631043  0.437936  0.655254  0.884586
4          0.019469  0.733367  0.902215  0.705865  0.491662  0.039995

Note

The groups are specified by means of a dictionary where indices are the variables names and values are the groups. If a variable is missing, the default group Dataset.DEFAULT_GROUP is considered.

Total running time of the script: ( 0 minutes 0.035 seconds)

Gallery generated by Sphinx-Gallery