Caching and recording discipline data

GEMSEO offers various features that allow to record and cache the values of discipline inputs and outputs, as well as its jacobian.

Introduction

Executing a discipline triggers a simulation which can be costly.

  • The first need for caching is to avoid duplicate simulations with the same inputs.

  • Then, the generated data contain valuable information which one may want to analyze after or during the execution, so storing this data on the disk is useful.

  • Finally, in case of machine crash, restarting the MDO process from scratch may be a waste of computational resources. Again, storing the input and output data on the disk avoids duplicate execution in case of crash.

In GEMSEO, each MDODiscipline has a cache.

>>> from gemseo.api import create_discipline
>>> discipline = create_discipline('AnalyticDiscipline', name='my_discipline', expressions_dict={'y':'2*x'})
>>> print(discipline.cache)
my_discipline
| Type: SimpleCache
| Input names: None
| Output names: None
| Length: 0
| Tolerance: 0.0

Setting a cache policy

All disciplines have the MDODiscipline.SIMPLE_CACHE cache policy enabled by default. Other ones are MDODiscipline.MEMORY_FULL_CACHE and MDODiscipline.HDF5_CACHE.

The cache policy can be defined by means of the MDODiscipline.set_cache_policy() method:

MDODiscipline.set_cache_policy(cache_type='SimpleCache', cache_tolerance=0.0, cache_hdf_file=None, cache_hdf_node_name=None)[source]

Set the type of cache to use and the tolerance level.

This method set the cache policy to cache data whose inputs are close to inputs whose outputs are already cached. The cache can be either a simple cache recording the last execution or a full cache storing all executions. Caching data can be either in-memory, e.g. SimpleCache and MemoryFullCache , or on the disk, e.g. HDF5Cache . CacheFactory.caches provides the list of available types of caches.

Parameters
  • cache_type (str) – type of cache to use.

  • cache_tolerance (float) – tolerance for the approximate cache maximal relative norm difference to consider that two input arrays are equal

  • cache_hdf_file (str) – the file to store the data, mandatory when HDF caching is used

  • cache_hdf_node_name (str) – name of the HDF dataset to store the discipline data. If None, self.name is used

>>> from gemseo.api import create_discipline
>>> discipline = create_discipline('AnalyticDiscipline', name='my_discipline', expressions_dict={'y':'2*x'})
>>> print(discipline.cache)
my_discipline
| Type: SimpleCache
| Input names: None
| Output names: None
| Length: 0
| Tolerance: 0.0
>>> discipline.set_cache_policy(discipline.MEMORY_FULL_CACHE)
>>> print(discipline.cache)
my_discipline
| Type: MemoryFullCache
| Input names: None
| Output names: None
| Length: 0
| Tolerance: 0.0

The different cache policies

Simple cache: storing the last execution

The simplest cache strategy in GEMSEO only stores the last execution data (inputs, outputs, and eventually the Jacobian matrix) in memory.

This cache strategy is implemented by means of the SimpleCache class:

class gemseo.caches.simple_cache.SimpleCache(tolerance=0.0, name=None)[source]

Simple discipline cache based on a dictionary. Only caches the last execution.

Initialize cache tolerance. By default, don’t use approximate cache. It is up to the user to choose to optimize CPU time with this or not could be something like 2 * finfo(float).eps

Parameters
  • tolerance (float) – Tolerance that defines if two input vectors are equal and cached data shall be returned. If 0, no approximation is made. Default: 0.

  • name (str) – Name of the cache.

Examples

>>> from gemseo.caches.simple_cache import SimpleCache
>>> cache = SimpleCache()

Memory cache: recording all executions in memory

The MemoryFullCache is the in-memory version of the HDF5Cache. It allows to store several executions of a discipline in terms of both inputs, outputs and jacobian values into a dictionary.

This cache strategy is implemented by means of the MemoryFullCache class:

class gemseo.caches.memory_full_cache.MemoryFullCache(tolerance=0.0, name=None)[source]

Cache using memory to cache all data.

Initialize a dictionary to cache data.

Initialize cache tolerance. By default, don’t use approximate cache. It is up to the user to choose to optimize CPU time with this or not could be something like 2 * finfo(float).eps

Parameters
  • tolerance (float) – Tolerance that defines if two input vectors are equal and cached data shall be returned. If 0, no approximation is made. Default: 0.

  • name (str) – Name of the cache.

Examples

>>> from gemseo.caches.memory_full_cache import MemoryFullCache
>>> cache = MemoryFullCache()

HDF5 cache: recording all executions on the disk

When all the execution data of the discipline shall be stored on the disk, the HDF5 cache policy can be used. HDF5 is a standard file format for storing simulation data. The following description is proposed by the HDF5 website:

“HDF5 is a data model, library, and file format for storing and managing data. It supports an unlimited variety of datatypes, and is designed for flexible and efficient I/O and for high volume and complex data. HDF5 is portable and is extensible, allowing applications to evolve in their use of HDF5. The HDF5 Technology suite includes tools and applications for managing, manipulating, viewing, and analyzing data in the HDF5 format.”

HDF5 manipulation libraries exist at least in C++, C, Java, Fortran and Python languages.

The HDFView application can be used to explore the data of the cache. To manipulate the data, one may use the HDF5Cache class, which can import the file and read all the data, or the data of a specific execution.

../_images/HDFView_cache.png

HDFView of the cache generated by a MDF DOE scenario execution on the SSBJ test case

This cache strategy is implemented by means of the HDF5Cache class:

class gemseo.caches.hdf5_cache.HDF5Cache(hdf_file_path, hdf_node_path, tolerance=0.0, name=None)[source]

Cache using disk HDF5 file to store the data

Initialize a singleton to access a HDF file. This singleton is used for multithreaded/multiprocessing access with a Lock.

Initialize cache tolerance. By default, don’t use approximate cache. It is up to the user to choose to optimize CPU time with this or not could be something like 2 * finfo(float).eps

Parameters
  • hdf_file_path (str) – Path of the HDF file.

  • hdf_node_path (str) – Node of the HDF file.

  • tolerance (float) – Tolerance that defines if two input vectors are equal and cached data shall be returned. If 0, no approximation is made. Default: 0.

  • name (str) – Name of the cache.

Examples

>>> from gemseo.caches.hdf5_cache import HDF5Cache
>>> cache = HDF5Cache('my_cache.hdf5', 'my_node')

[DEV] The abstract caches