Caching and recording discipline data¶
Introduction¶
There are several reasons to store the evaluations (input, output and Jacobian values) of a discipline:
avoid evaluating a discipline at an input value for which it has already been evaluated,
save data for post-processing purposes, e.g. visualization, statistics, machine learning, debugging, etc,
save the current state in memory to restart a crashed sequential disciplinary process from the iteration preceding the unfortunate event,
…
Some of these reasons are all the more important as the discipline triggers a simulation which can be costly. Caching disciplinary data helps to avoid wasting computing resources.
The basics¶
In GEMSEO, an MDODiscipline
is composed of a cache
to store these evaluations
expressed in terms of input, output and Jacobian data.
The caching mechanism¶
When the user passes an input value to the method MDODiscipline.execute()
,
the MDODiscipline
looks in its cache
if there is an output value associated with this input value.
If so,
it returns it to the user.
Otherwise,
it computes it, stores it in the cache and returns it to the user.
Define a tolerance for caching¶
The user can pass a tolerance below which two input arrays are considered equal:
numpy.linalg.norm(user_array-cached_array)/(1+norm(cached_array)) <= tolerance
.
This tolerance could be useful to optimize CPU time.
It could be something like 2 * numpy.finfo(float).eps
.
Export to another format¶
The cache
can be converted to a Dataset
for post-processing purposes
using its method to_dataset()
.
It can also be saved into an XML file to be read by ggobi
using its method to_ggobi()
.
Note
For the sake of performance,
the input value of type Mapping[str, ndarray | int | float]
is flatten to a NumPy array,
hashed using the algorithm XXH64 of the xxHash library
and the hashed value compare to the ones stored in the cache
.
Set the cache policy of a discipline¶
The data can be cached either:
in memory:
the
SimpleCache
(default policy) only stores in memory the data associated with the last call toMDODiscipline.execute()
,the
MemoryFullCache
stores in memory the data associated with all the calls toMDODiscipline.execute()
,
on the disk:
the
HDF5Cache
stores in a node of an HDF file the data associated with all the calls toMDODiscipline.execute()
.
The cache strategy of an MDODiscipline
can be changed with the method MDODiscipline.set_cache_policy()
by passing as first argument the name of the cache class, e.g. "MemoryFullCache"
.
Note
The types of cache can be extended by subclassing AbstractFullCache
or MemoryFullCache
.
set_cache_policy()
will find the new types automatically
because it is based on a CacheFactory
.
Advanced use¶
Get metadata¶
You can easily get:
the number of entries:
n_entries = len(cache)
,the names of the input variables:
input_names = cache.input_names
,the names of the output variables:
output_names = cache.output_names
,the size of the variables:
size = cache.names_to_sizes[variable_name]
.
Get the last entry¶
Use last_entry = cache.last_entry
to retrieve the last cached data.
last_entry
is a CacheEntry
with fields "inputs
”, "outputs"
and "jacobian"
,
to be used as output_value = cache_entry.outputs
.
Clear the cache¶
Use cache.clear()
to remove all the entries.
Handle the cache as a dictionary¶
A cache can be handled as a dictionary:
store an output value:
cache[input_value] = (output_value, None)
store a Jacobian value:
cache[input_value] = (None, jacobian_value)
store both Jacobian and output values:
cache[input_value] = (output_value, jacobian_value)
retrieve an entry:
cache_entry = cache[input_value]
.
Cache data in an HDF file¶
HDF5 is a standard file format for storing simulation data:
“HDF5 is a data model, library, and file format for storing and managing data. It supports an unlimited variety of datatypes, and is designed for flexible and efficient I/O and for high volume and complex data. HDF5 is portable and is extensible, allowing applications to evolve in their use of HDF5. The HDF5 Technology suite includes tools and applications for managing, manipulating, viewing, and analyzing data in the HDF5 format.”
The HDFView application can be used to explore the data of the cache.