Dataset from a cache

In this example, we will see how to build a Dataset from objects of an AbstractFullCache. For that, we need to import this Dataset class:

from __future__ import annotations

from gemseo.api import configure_logger
from gemseo.caches.memory_full_cache import MemoryFullCache
from numpy import array

configure_logger()
<RootLogger root (INFO)>

Synthetic data

Let us consider a MemoryFullCache storing two parameters:

  • x with dimension 1 which is a cache input,

  • y with dimension 2 which is a cache output.

cache = MemoryFullCache()
cache[{"x": array([1.0])}] = ({"y": array([2.0, 3.0])}, None)
cache[{"x": array([4.0])}] = ({"y": array([5.0, 6.0])}, None)

Create a dataset

We can easily build a dataset from this MemoryFullCache, either by separating the inputs from the outputs (default option):

dataset = cache.export_to_dataset("toy_cache")
print(dataset)
toy_cache
   Number of samples: 2
   Number of variables: 2
   Variables names and sizes by group:
      inputs: x (1)
      outputs: y (2)
   Number of dimensions (total = 3) by group:
      inputs: 1
      outputs: 2

or by considering all features as default parameters:

dataset = cache.export_to_dataset("toy_cache", categorize=False)
print(dataset)
toy_cache
   Number of samples: 2
   Number of variables: 2
   Variables names and sizes by group:
      parameters: x (1), y (2)
   Number of dimensions (total = 3) by group:
      parameters: 3

Access properties

dataset = cache.export_to_dataset("toy_cache")

Variables names

We can access the variables names:

print(dataset.variables)
['x', 'y']

Variables sizes

We can access the variables sizes:

print(dataset.sizes)
{'x': 1, 'y': 2}

Variables groups

We can access the variables groups:

print(dataset.groups)
['inputs', 'outputs']

Access data

Access by group

We can get the data by group, either as an array (default option):

print(dataset.get_data_by_group("inputs"))
[[1.]
 [4.]]

or as a dictionary indexed by the variables names:

print(dataset.get_data_by_group("inputs", True))
{'x': array([[1.],
       [4.]])}

Access by variable name

We can get the data by variables names, either as a dictionary indexed by the variables names (default option):

print(dataset.get_data_by_names(["x"]))
{'x': array([[1.],
       [4.]])}

or as an array:

print(dataset.get_data_by_names(["x", "y"], False))
[[1. 2. 3.]
 [4. 5. 6.]]

Access all data

We can get all the data, either as a large array:

print(dataset.get_all_data())
({'inputs': array([[1.],
       [4.]]), 'outputs': array([[2., 3.],
       [5., 6.]])}, {'inputs': ['x'], 'outputs': ['y']}, {'x': 1, 'y': 2})

or as a dictionary indexed by variables names:

print(dataset.get_all_data(as_dict=True))
{'inputs': {'x': array([[1.],
       [4.]])}, 'outputs': {'y': array([[2., 3.],
       [5., 6.]])}}

We can get these data sorted by category, either with a large array for each category:

print(dataset.get_all_data(by_group=False))
(array([[1., 2., 3.],
       [4., 5., 6.]]), ['x', 'y'], {'x': 1, 'y': 2})

or with a dictionary of variables names:

print(dataset.get_all_data(by_group=False, as_dict=True))
{'x': array([[1.],
       [4.]]), 'y': array([[2., 3.],
       [5., 6.]])}

Total running time of the script: ( 0 minutes 0.024 seconds)

Gallery generated by Sphinx-Gallery