Caching and recording discipline data¶
There are several reasons to store the evaluations (input, output and Jacobian values) of a discipline:
avoid evaluating a discipline at an input value for which it has already been evaluated,
save data for post-processing purposes, e.g. visualization, statistics, machine learning, debugging, etc,
save the current state in memory to restart a crashed sequential disciplinary process from the iteration preceding the unfortunate event,
Some of these reasons are all the more important as the discipline triggers a simulation which can be costly. Caching disciplinary data helps to avoid wasting computing resources.
In GEMSEO, a
MDODiscipline is composed of a
cache to store these evaluations
expressed in terms of input, output and Jacobian data.
The caching mechanism¶
When the user passes an input value to the method
MDODiscipline looks in its
if there is an output value associated with this input value.
it returns it to the user.
it computes it, stores it in the cache and returns it to the user.
Define a tolerance for caching¶
The user can pass a tolerance below which two input arrays are considered equal:
numpy.linalg.norm(user_array-cached_array)/(1+norm(cached_array)) <= tolerance.
This tolerance could be useful to optimize CPU time.
It could be something like
2 * numpy.finfo(float).eps.
Export to another format¶
cache can be converted to a
Dataset for post-processing purposes
using its method
It can also be saved into an XML file to be read by ggobi
using its method
For the sake of performance,
the input value of type
Mapping[str, ndarray | int | float] is flatten to a NumPy array,
hashed using the algorithm XXH64 of the xxHash library
and the hashed value compare to the ones stored in the
Set the cache policy of a discipline¶
The data can be cached either:
SimpleCache(default policy) only stores in memory the data associated with the last call to
MemoryFullCachestores in memory the data associated with all the calls to
on the disk:
HDF5Cachestores in a node of an HDF file the data associated with all the calls to
The cache strategy of a
MDODiscipline can be changed with the method
by passing as first argument the name of the cache class, e.g.
The types of cache can be extended by subclassing
set_cache_policy() will find the new types automatically
because it is based on a
You can easily get:
the number of entries:
n_entries = len(cache),
the names of the input variables:
input_names = cache.input_names,
the names of the output variables:
output_names = cache.output_names,
the size of the variables:
size = cache.names_to_sizes[variable_name].
Get the last entry¶
last_entry = cache.last_entry to retrieve the last cached data.
last_entry is a
CacheEntry with fields
to be used as
output_value = cache_entry.outputs.
Clear the cache¶
cache.clear() to remove all the entries.
Handle the cache as a dictionary¶
A cache can be handled as a dictionary:
store an output value:
cache[input_value] = (output_value, None)
store a Jacobian value:
cache[input_value] = (None, jacobian_value)
store both Jacobian and output values:
cache[input_value] = (output_value, jacobian_value)
retrieve an entry:
cache_entry = cache[input_value].
Cache data in an HDF file¶
HDF5 is a standard file format for storing simulation data. The following description is proposed by the HDF5 website:
“HDF5 is a data model, library, and file format for storing and managing data. It supports an unlimited variety of datatypes, and is designed for flexible and efficient I/O and for high volume and complex data. HDF5 is portable and is extensible, allowing applications to evolve in their use of HDF5. The HDF5 Technology suite includes tools and applications for managing, manipulating, viewing, and analyzing data in the HDF5 format.”
The HDFView application can be used to explore the data of the cache.