Note
Go to the end to download the full example code
Dataset from a NumPy array¶
In this example, we will see how to build a Dataset
from an NumPy array.
from __future__ import annotations
from gemseo import configure_logger
from gemseo.datasets.dataset import Dataset
from numpy import concatenate
from numpy.random import rand
configure_logger()
<RootLogger root (INFO)>
Let us consider three parameters \(x_1\), \(x_2\) and \(x_3\) of size 1, 2 and 3 respectively. We generate 5 random samples of the inputs where
x_1 is stored in the first column,
x_2 is stored in the 2nd and 3rd columns
and 5 random samples of the outputs:
n_samples = 5
inputs = rand(n_samples, 3)
outputs = rand(n_samples, 3)
data = concatenate((inputs, outputs), 1)
A dataset with default names¶
We create a Dataset
from the NumPy array only
and let GEMSEO give default names to its columns:
dataset = Dataset.from_array(data)
print(dataset)
GROUP parameters
VARIABLE x_0 x_1 x_2 x_3 x_4 x_5
COMPONENT 0 0 0 0 0 0
0 0.014615 0.483645 0.969826 0.556766 0.465260 0.413557
1 0.281531 0.669828 0.844281 0.628156 0.917050 0.420460
2 0.939456 0.025326 0.845118 0.387614 0.361267 0.199191
3 0.081400 0.313387 0.742652 0.832763 0.230162 0.554412
4 0.368240 0.056516 0.743245 0.781341 0.025794 0.503514
A dataset with custom names¶
We can also pass the names and sizes of the variables:
names_to_sizes = {"x_1": 1, "x_2": 2, "y_1": 3}
dataset = Dataset.from_array(data, ["x_1", "x_2", "y_1"], names_to_sizes)
print(dataset)
GROUP parameters
VARIABLE x_1 x_2 y_1
COMPONENT 0 0 1 0 1 2
0 0.014615 0.483645 0.969826 0.556766 0.465260 0.413557
1 0.281531 0.669828 0.844281 0.628156 0.917050 0.420460
2 0.939456 0.025326 0.845118 0.387614 0.361267 0.199191
3 0.081400 0.313387 0.742652 0.832763 0.230162 0.554412
4 0.368240 0.056516 0.743245 0.781341 0.025794 0.503514
Warning
The number of variables names must be equal to the number of columns of the data array. Otherwise, the user has to specify the sizes of the different variables by means of a dictionary and be careful that the total size is equal to this number of columns.
A dataset with custom groups¶
We can also use the notions of groups of variables:
groups = {"x_1": "inputs", "x_2": "inputs", "y_1": "outputs"}
dataset = Dataset.from_array(data, ["x_1", "x_2", "y_1"], names_to_sizes, groups)
print(dataset)
GROUP inputs outputs
VARIABLE x_1 x_2 y_1
COMPONENT 0 0 1 0 1 2
0 0.014615 0.483645 0.969826 0.556766 0.465260 0.413557
1 0.281531 0.669828 0.844281 0.628156 0.917050 0.420460
2 0.939456 0.025326 0.845118 0.387614 0.361267 0.199191
3 0.081400 0.313387 0.742652 0.832763 0.230162 0.554412
4 0.368240 0.056516 0.743245 0.781341 0.025794 0.503514
Note
The groups are specified by means of a dictionary
where indices are the variables names and values are the groups.
If a variable is missing,
the default group Dataset.DEFAULT_GROUP
is considered.
Total running time of the script: ( 0 minutes 0.035 seconds)