Note
Click here to download the full example code
Dataset from a cache¶
In this example, we will see how to build a Dataset
from objects
of an AbstractFullCache
.
For that, we need to import this Dataset
class:
from __future__ import annotations
from gemseo.api import configure_logger
from gemseo.caches.memory_full_cache import MemoryFullCache
from numpy import array
configure_logger()
<RootLogger root (INFO)>
Synthetic data¶
Let us consider a MemoryFullCache
storing two parameters:
x with dimension 1 which is a cache input,
y with dimension 2 which is a cache output.
cache = MemoryFullCache()
cache[{"x": array([1.0])}] = ({"y": array([2.0, 3.0])}, None)
cache[{"x": array([4.0])}] = ({"y": array([5.0, 6.0])}, None)
Create a dataset¶
We can easily build a dataset from this MemoryFullCache
,
either by separating the inputs from the outputs (default option):
dataset = cache.export_to_dataset("toy_cache")
print(dataset)
toy_cache
Number of samples: 2
Number of variables: 2
Variables names and sizes by group:
inputs: x (1)
outputs: y (2)
Number of dimensions (total = 3) by group:
inputs: 1
outputs: 2
or by considering all features as default parameters:
dataset = cache.export_to_dataset("toy_cache", categorize=False)
print(dataset)
toy_cache
Number of samples: 2
Number of variables: 2
Variables names and sizes by group:
parameters: x (1), y (2)
Number of dimensions (total = 3) by group:
parameters: 3
Access properties¶
dataset = cache.export_to_dataset("toy_cache")
Variables names¶
We can access the variables names:
print(dataset.variables)
['x', 'y']
Variables sizes¶
We can access the variables sizes:
print(dataset.sizes)
{'x': 1, 'y': 2}
Variables groups¶
We can access the variables groups:
print(dataset.groups)
['inputs', 'outputs']
Access data¶
Access by group¶
We can get the data by group, either as an array (default option):
print(dataset.get_data_by_group("inputs"))
[[1.]
[4.]]
or as a dictionary indexed by the variables names:
print(dataset.get_data_by_group("inputs", True))
{'x': array([[1.],
[4.]])}
Access by variable name¶
We can get the data by variables names, either as a dictionary indexed by the variables names (default option):
print(dataset.get_data_by_names(["x"]))
{'x': array([[1.],
[4.]])}
or as an array:
print(dataset.get_data_by_names(["x", "y"], False))
[[1. 2. 3.]
[4. 5. 6.]]
Access all data¶
We can get all the data, either as a large array:
print(dataset.get_all_data())
({'inputs': array([[1.],
[4.]]), 'outputs': array([[2., 3.],
[5., 6.]])}, {'inputs': ['x'], 'outputs': ['y']}, {'x': 1, 'y': 2})
or as a dictionary indexed by variables names:
print(dataset.get_all_data(as_dict=True))
{'inputs': {'x': array([[1.],
[4.]])}, 'outputs': {'y': array([[2., 3.],
[5., 6.]])}}
We can get these data sorted by category, either with a large array for each category:
print(dataset.get_all_data(by_group=False))
(array([[1., 2., 3.],
[4., 5., 6.]]), ['x', 'y'], {'x': 1, 'y': 2})
or with a dictionary of variables names:
print(dataset.get_all_data(by_group=False, as_dict=True))
{'x': array([[1.],
[4.]]), 'y': array([[2., 3.],
[5., 6.]])}
Total running time of the script: ( 0 minutes 0.035 seconds)