Caching and recording discipline data¶
GEMSEO offers various features that allow to record and cache the values of discipline inputs and outputs, as well as its jacobian.
Introduction¶
Executing a discipline triggers a simulation which can be costly.
The first need for caching is to avoid duplicate simulations with the same inputs.
Then, the generated data contain valuable information which one may want to analyze after or during the execution, so storing this data on the disk is useful.
Finally, in case of machine crash, restarting the MDO process from scratch may be a waste of computational resources. Again, storing the input and output data on the disk avoids duplicate execution in case of crash.
In GEMSEO, each MDODiscipline
has a cache.
>>> from gemseo.api import create_discipline
>>> discipline = create_discipline('AnalyticDiscipline', name='my_discipline', expressions={'y':'2*x'})
>>> print(discipline.cache)
my_discipline
| Type: SimpleCache
| Input names: None
| Output names: None
| Length: 0
| Tolerance: 0.0
Setting a cache policy¶
All disciplines have the MDODiscipline.SIMPLE_CACHE
cache policy enabled by default.
Other ones are MDODiscipline.MEMORY_FULL_CACHE
and
MDODiscipline.HDF5_CACHE
.
The cache policy can be defined by means of the MDODiscipline.set_cache_policy()
method:
- MDODiscipline.set_cache_policy(cache_type='SimpleCache', cache_tolerance=0.0, cache_hdf_file=None, cache_hdf_node_name=None, is_memory_shared=True)[source]
Set the type of cache to use and the tolerance level.
This method defines when the output data have to be cached according to the distance between the corresponding input data and the input data already cached for which output data are also cached.
The cache can be either a
SimpleCache
recording the last execution or a cache storing all executions, e.g.MemoryFullCache
andHDF5Cache
. Caching data can be either in-memory, e.g.SimpleCache
andMemoryFullCache
, or on the disk, e.g.HDF5Cache
.The attribute
CacheFactory.caches
provides the available caches types.- Parameters
cache_type (str) –
The type of cache.
By default it is set to SimpleCache.
cache_tolerance (float) –
The maximum relative norm of the difference between two input arrays to consider that two input arrays are equal.
By default it is set to 0.0.
cache_hdf_file (str | Path | None) –
The path to the HDF file to store the data; this argument is mandatory when the
MDODiscipline.HDF5_CACHE
policy is used.By default it is set to None.
cache_hdf_node_name (str | None) –
The name of the HDF file node to store the discipline data. If None,
MDODiscipline.name
is used.By default it is set to None.
is_memory_shared (bool) –
Whether to store the data with a shared memory dictionary, which makes the cache compatible with multiprocessing.
By default it is set to True.
- Return type
None
>>> from gemseo.api import create_discipline
>>> discipline = create_discipline('AnalyticDiscipline', name='my_discipline', expressions={'y':'2*x'})
>>> print(discipline.cache)
my_discipline
| Type: SimpleCache
| Input names: None
| Output names: None
| Length: 0
| Tolerance: 0.0
>>> discipline.set_cache_policy(discipline.MEMORY_FULL_CACHE)
>>> print(discipline.cache)
my_discipline
| Type: MemoryFullCache
| Input names: None
| Output names: None
| Length: 0
| Tolerance: 0.0
The different cache policies¶
Simple cache: storing the last execution¶
The simplest cache strategy in GEMSEO only stores the last execution data (inputs, outputs, and eventually the Jacobian matrix) in memory.
This cache strategy is implemented by means of the SimpleCache
class:
- class gemseo.caches.simple_cache.SimpleCache(tolerance=0.0, name=None)[source]
Dictionary-based cache storing a unique entry.
When caching an input data different from this entry, this entry is replaced by a new one initialized with this input data.
- Parameters
tolerance (float) –
The tolerance below which two input arrays are considered equal:
norm(new_array-cached_array)/(1+norm(cached_array)) <= tolerance
. If this is the case for all the input names, then the cached output data shall be returned rather than re-evaluating the discipline. This tolerance could be useful to optimize CPU time. It could be something like2 * numpy.finfo(float).eps
.By default it is set to 0.0.
name (str | None) –
A name for the cache. If
None
, use the class name.By default it is set to None.
- Return type
None
- cache_jacobian(input_data, jacobian_data)[source]
Cache the input and Jacobian data.
- Parameters
input_data (Mapping[str, Any]) – The data containing the input data to cache.
jacobian_data (Mapping[str, Mapping[str, numpy.ndarray]]) – The Jacobian data to cache.
- Return type
None
- cache_outputs(input_data, output_data)[source]
Cache input and output data.
- clear()[source]
Clear the cache.
- Return type
None
- export_to_dataset(name=None, by_group=True, categorize=True, input_names=None, output_names=None)
Build a
Dataset
from the cache.- Parameters
name (str | None) –
A name for the dataset. If
None
, use the name of the cache.By default it is set to None.
by_group (bool) –
Whether to store the data by group in
Dataset.data
, in the sense of one unique NumPy array per group. Ifcategorize
isFalse
, there is a unique group:Dataset.PARAMETER_GROUP`
. Ifcategorize
isTrue
, the groups are stored inDataset.INPUT_GROUP
andDataset.OUTPUT_GROUP
. Ifby_group
isFalse
, store the data by variable names.By default it is set to True.
categorize (bool) –
Whether to distinguish between the different groups of variables. Otherwise, group all the variables in
Dataset.PARAMETER_GROUP`
.By default it is set to True.
input_names (Iterable[str] | None) –
The names of the inputs to be exported. If
None
, use all the inputs.By default it is set to None.
output_names (Iterable[str] | None) –
The names of the outputs to be exported. If
None
, use all the outputs.By default it is set to None.
- Returns
A dataset version of the cache.
- Return type
- get(k[, d]) D[k] if k in D, else d. d defaults to None.
- items() a set-like object providing a view on D's items
- keys() a set-like object providing a view on D's keys
- values() an object providing a view on D's values
- property last_entry: gemseo.core.cache.CacheEntry
The last cache entry.
- property penultimate_entry: gemseo.core.cache.CacheEntry
The penultimate cache entry.
Memory cache: recording all executions in memory¶
The MemoryFullCache
is the in-memory version of the HDF5Cache
.
It allows to store several executions of a discipline in terms of both inputs, outputs and jacobian values into a dictionary.
This cache strategy is implemented by means of the MemoryFullCache
class:
- class gemseo.caches.memory_full_cache.MemoryFullCache(tolerance=0.0, name=None, is_memory_shared=True)[source]
Cache using memory to cache all the data.
- Parameters
- Return type
None
Warning
If
is_memory_shared
isFalse
and multiple disciplines point to the same cache or the process is multi-processed, there may be duplicate computations because the cache will not be shared among the processes. This class relies on some multiprocessing features, it is therefore necessary to protect its execution with anif __name__ == '__main__':
statement when working on Windows.- cache_jacobian(input_data, jacobian_data)
Cache the input and Jacobian data.
- Parameters
input_data (Mapping[str, Any]) – The data containing the input data to cache.
jacobian_data (Mapping[str, Mapping[str, numpy.ndarray]]) – The Jacobian data to cache.
- Return type
None
- cache_outputs(input_data, output_data)
Cache input and output data.
- clear()[source]
Clear the cache.
- Return type
None
- export_to_dataset(name=None, by_group=True, categorize=True, input_names=None, output_names=None)
Build a
Dataset
from the cache.- Parameters
name (str | None) –
A name for the dataset. If
None
, use the name of the cache.By default it is set to None.
by_group (bool) –
Whether to store the data by group in
Dataset.data
, in the sense of one unique NumPy array per group. Ifcategorize
isFalse
, there is a unique group:Dataset.PARAMETER_GROUP`
. Ifcategorize
isTrue
, the groups are stored inDataset.INPUT_GROUP
andDataset.OUTPUT_GROUP
. Ifby_group
isFalse
, store the data by variable names.By default it is set to True.
categorize (bool) –
Whether to distinguish between the different groups of variables. Otherwise, group all the variables in
Dataset.PARAMETER_GROUP`
.By default it is set to True.
input_names (Iterable[str] | None) –
The names of the inputs to be exported. If
None
, use all the inputs.By default it is set to None.
output_names (Iterable[str] | None) –
The names of the outputs to be exported. If
None
, use all the outputs.By default it is set to None.
- Returns
A dataset version of the cache.
- Return type
- export_to_ggobi(file_path, input_names=None, output_names=None)
Export the cache to an XML file for ggobi tool.
- Parameters
file_path (str) – The path of the file to export the cache.
input_names (Iterable[str] | None) –
The names of the inputs to export. If
None
, export all of them.By default it is set to None.
output_names (Iterable[str] | None) –
The names of the outputs to export. If
None
, export all of them.By default it is set to None.
- Return type
None
- get(k[, d]) D[k] if k in D, else d. d defaults to None.
- items() a set-like object providing a view on D's items
- keys() a set-like object providing a view on D's keys
- update(other_cache)
Update from another cache.
- Parameters
other_cache (gemseo.core.cache.AbstractFullCache) – The cache to update the current one.
- Return type
None
- values() an object providing a view on D's values
- property copy: gemseo.caches.memory_full_cache.MemoryFullCache
Copy the current cache.
- Returns
A copy of the current cache.
- property last_entry: gemseo.core.cache.CacheEntry
The last cache entry.
HDF5 cache: recording all executions on the disk¶
When all the execution data of the discipline shall be stored on the disk, the HDF5 cache policy can be used. HDF5 is a standard file format for storing simulation data. The following description is proposed by the HDF5 website:
“HDF5 is a data model, library, and file format for storing and managing data. It supports an unlimited variety of datatypes, and is designed for flexible and efficient I/O and for high volume and complex data. HDF5 is portable and is extensible, allowing applications to evolve in their use of HDF5. The HDF5 Technology suite includes tools and applications for managing, manipulating, viewing, and analyzing data in the HDF5 format.”
HDF5 manipulation libraries exist at least in C++, C, Java, Fortran and Python languages.
The HDFView application can be used to explore the data of the cache.
To manipulate the data, one may use the HDF5Cache
class, which can import the file and read all the data,
or the data of a specific execution.

HDFView of the cache generated by a MDF DOE scenario execution on the SSBJ test case¶
This cache strategy is implemented by means of the HDF5Cache
class:
- class gemseo.caches.hdf5_cache.HDF5Cache(hdf_file_path='cache.hdf5', hdf_node_path='node', tolerance=0.0, name=None)[source]
Cache using disk HDF5 file to store the data.
- Parameters
hdf_file_path (str | Path) –
The path of the HDF file. Initialize a singleton to access the HDF file. This singleton is used for multithreaded/multiprocessing access with a lock.
By default it is set to cache.hdf5.
hdf_node_path (str) –
The node of the HDF file.
By default it is set to node.
name (str | None) –
A name for the cache. If
None
, usehdf_note_path
.By default it is set to None.
tolerance (float) –
By default it is set to 0.0.
- Return type
None
Warning
This class relies on some multiprocessing features, it is therefore necessary to protect its execution with an
if __name__ == '__main__':
statement when working on Windows.- cache_jacobian(input_data, jacobian_data)
Cache the input and Jacobian data.
- Parameters
input_data (Mapping[str, Any]) – The data containing the input data to cache.
jacobian_data (Mapping[str, Mapping[str, numpy.ndarray]]) – The Jacobian data to cache.
- Return type
None
- cache_outputs(input_data, output_data)
Cache input and output data.
- clear()[source]
Clear the cache.
- Return type
None
- export_to_dataset(name=None, by_group=True, categorize=True, input_names=None, output_names=None)
Build a
Dataset
from the cache.- Parameters
name (str | None) –
A name for the dataset. If
None
, use the name of the cache.By default it is set to None.
by_group (bool) –
Whether to store the data by group in
Dataset.data
, in the sense of one unique NumPy array per group. Ifcategorize
isFalse
, there is a unique group:Dataset.PARAMETER_GROUP`
. Ifcategorize
isTrue
, the groups are stored inDataset.INPUT_GROUP
andDataset.OUTPUT_GROUP
. Ifby_group
isFalse
, store the data by variable names.By default it is set to True.
categorize (bool) –
Whether to distinguish between the different groups of variables. Otherwise, group all the variables in
Dataset.PARAMETER_GROUP`
.By default it is set to True.
input_names (Iterable[str] | None) –
The names of the inputs to be exported. If
None
, use all the inputs.By default it is set to None.
output_names (Iterable[str] | None) –
The names of the outputs to be exported. If
None
, use all the outputs.By default it is set to None.
- Returns
A dataset version of the cache.
- Return type
- export_to_ggobi(file_path, input_names=None, output_names=None)
Export the cache to an XML file for ggobi tool.
- Parameters
file_path (str) – The path of the file to export the cache.
input_names (Iterable[str] | None) –
The names of the inputs to export. If
None
, export all of them.By default it is set to None.
output_names (Iterable[str] | None) –
The names of the outputs to export. If
None
, export all of them.By default it is set to None.
- Return type
None
- get(k[, d]) D[k] if k in D, else d. d defaults to None.
- items() a set-like object providing a view on D's items
- keys() a set-like object providing a view on D's keys
- update(other_cache)
Update from another cache.
- Parameters
other_cache (gemseo.core.cache.AbstractFullCache) – The cache to update the current one.
- Return type
None
- static update_file_format(hdf_file_path)[source]
Update the format of a HDF5 file.
See also
- Parameters
hdf_file_path (str | Path) – A HDF5 file path.
- Return type
None
- values() an object providing a view on D's values
- property hdf_file: gemseo.caches.hdf5_file_singleton.HDF5FileSingleton
The hdf file handler.
- property last_entry: gemseo.core.cache.CacheEntry
The last cache entry.
[DEV] The abstract caches¶
MemoryFullCache
andHDF5Cache
inherit fromAbstractFullCache
.AbstractFullCache
andSimpleCache
inherit fromAbstractCache
.Both
AbstractCache
andAbstractFullCache
are abstract classes.Any class inheriting from
AbstractCache
can be instantiated from theCacheFactory
.