hdf5_cache module¶
Caching module to avoid multiple evaluations of a discipline¶
Classes:
|
Cache using disk HDF5 file to store the data. |
|
Singleton to access a HDF file Used for multithreaded/multiprocessing access with a Lock. |
- class gemseo.caches.hdf5_cache.HDF5Cache(hdf_file_path, hdf_node_path, tolerance=0.0, name=None)[source]¶
Bases:
gemseo.core.cache.AbstractFullCache
Cache using disk HDF5 file to store the data.
Initialize a singleton to access a HDF file. This singleton is used for multithreaded/multiprocessing access with a Lock.
Initialize cache tolerance. By default, don’t use approximate cache. It is up to the user to choose to optimize CPU time with this or not could be something like 2 * finfo(float).eps
- Parameters
hdf_file_path (str) – Path of the HDF file.
hdf_node_path (str) – Node of the HDF file.
tolerance (float) –
Tolerance that defines if two input vectors are equal and cached data shall be returned. If 0, no approximation is made. Default: 0.
By default it is set to 0.0.
name (str) –
Name of the cache.
By default it is set to None.
Examples
>>> from gemseo.caches.hdf5_cache import HDF5Cache >>> cache = HDF5Cache('my_cache.h5', 'my_node')
Attributes:
Return the inputs names.
Get the maximal length of the cache (the maximal number of stored elements).
Return the outputs names.
List of samples indices.
Return the variables sizes.
Methods:
cache_jacobian
(input_data, input_names, jacobian)Cache jacobian data to avoid re evaluation.
cache_outputs
(input_data, input_names, ...)Cache data to avoid re evaluation.
clear
()Clear the cache.
export_to_dataset
([name, by_group, ...])Set Dataset from a cache.
export_to_ggobi
(file_path[, inputs_names, ...])Export history to xml file format for ggobi tool.
get_all_data
([as_iterator])Return all the data in the cache.
get_data
(index, **options)Gets the data associated to a sample ID.
Retrieve the last execution inputs.
Retrieve the last execution outputs.
Get the length of the cache, ie the number of stored elements.
get_outputs
(input_data[, input_names])Check if the discipline has already been evaluated for the given input data dictionary.
merge
(other_cache)Merges an other cache with self.
update_file_format
(hdf_file_path)Update the format of a HDF5 file.
- INPUTS_GROUP = 'inputs'¶
- JACOBIAN_GROUP = 'jacobian'¶
- OUTPUTS_GROUP = 'outputs'¶
- SAMPLE_GROUP = 'sample'¶
- cache_jacobian(input_data, input_names, jacobian)¶
Cache jacobian data to avoid re evaluation.
- Parameters
input_data (dict) – Input data to cache.
input_names (list(str)) – List of input data names.
jacobian (dict) – Jacobian to cache.
- cache_outputs(input_data, input_names, output_data, output_names=None)¶
Cache data to avoid re evaluation.
- Parameters
input_data (dict) – Input data to cache.
input_names (list(str)) – List of input data names.
output_data (dict) – Output data to cache.
output_names (list(str)) –
List of output data names. If None, use all output names. Default: None.
By default it is set to None.
- clear()[source]¶
Clear the cache.
Examples
>>> from gemseo.caches.hdf5_cache import HDF5Cache >>> from numpy import array >>> cache = HDF5Cache('my_cache.h5', 'my_node') >>> for index in range(5): >>> data = {'x': array([1.])*index, 'y': array([.2])*index} >>> cache.cache_outputs(data, ['x'], data, ['y']) >>> cache.get_length() 5 >>> cache.clear() >>> cache.get_length() 0
- export_to_dataset(name=None, by_group=True, categorize=True, inputs_names=None, outputs_names=None)¶
Set Dataset from a cache.
- Parameters
name (str) –
dataset name.
By default it is set to None.
by_group (bool) –
if True, store the data by group. Otherwise, store them by variables. Default: True
By default it is set to True.
categorize (bool) –
distinguish between the different groups of variables. Default: True.
By default it is set to True.
inputs_names (list(str)) –
list of inputs names. If None, use all inputs. Default: None.
By default it is set to None.
outputs_names (list(str)) –
list of outputs names. If None, use all outputs. Default: None.
By default it is set to None.
- export_to_ggobi(file_path, inputs_names=None, outputs_names=None)¶
Export history to xml file format for ggobi tool.
- Parameters
file_path (str) – Path to export the file.
inputs_names (list(str)) –
List of inputs to include in the export. By default, take all of them.
By default it is set to None.
outputs_names (list(str)) –
Names of outputs to export. By default, take all of them.
By default it is set to None.
- get_all_data(as_iterator=False)¶
Return all the data in the cache.
- Parameters
as_iterator (bool) –
If True, return an iterator. Otherwise a dictionary. Default: False.
By default it is set to False.
- Returns
all_data – A dictionary of dictionaries for inputs, outputs and jacobian where keys are data indices.
- Return type
dict
- get_data(index, **options)[source]¶
Gets the data associated to a sample ID.
- Parameters
index (str) – sample ID.
options – options passed to the _read_data() method.
- Returns
input data, output data and jacobian.
- Return type
dict
- get_last_cached_inputs()¶
Retrieve the last execution inputs.
- Returns
inputs – Last cached inputs.
- Return type
dict
- get_last_cached_outputs()¶
Retrieve the last execution outputs.
- Returns
outputs – Last cached outputs.
- Return type
dict
- get_length()¶
Get the length of the cache, ie the number of stored elements.
- Returns
length – Length of the cache.
- Return type
int
- get_outputs(input_data, input_names=None)¶
Check if the discipline has already been evaluated for the given input data dictionary. If True, return the associated cache, otherwise return None.
- Parameters
input_data (dict) – Input data dictionary to test for caching.
input_names (list(str)) –
List of input data names.
By default it is set to None.
- Returns
output_data (dict) – Output data if there is no need to evaluate the discipline. None otherwise.
jacobian (dict) – Jacobian if there is no need to evaluate the discipline. None otherwise.
- property inputs_names¶
Return the inputs names.
- property max_length¶
Get the maximal length of the cache (the maximal number of stored elements).
- Returns
length – Maximal length of the cache.
- Return type
int
- merge(other_cache)¶
Merges an other cache with self.
- Parameters
other_cache (AbstractFullCache) – Cache to merge with the current one.
- property outputs_names¶
Return the outputs names.
- property samples_indices¶
List of samples indices.
- static update_file_format(hdf_file_path)[source]¶
Update the format of a HDF5 file.
See also
- Parameters
hdf_file_path (Union[str, pathlib.Path]) – A HDF5 file path.
- Return type
None
- property varsizes¶
Return the variables sizes.
- class gemseo.caches.hdf5_cache.HDF5FileSingleton(*args, **kwargs)[source]¶
Bases:
object
Singleton to access a HDF file Used for multithreaded/multiprocessing access with a Lock.
Constructor.
- Parameters
hdf_file_path – path to the HDF5 file
Attributes:
Methods:
clear
(hdf_node_path)Clear the data in the cache.
has_group
(group_number, group_name, ...)Check if a group is present in the HDF file.
read_data
(group_number, group_name, ...[, ...])Read a data dict in the hdf.
read_hashes
(hashes_dict, hdf_node_path)Read the hashes in the HDF file.
update_file_format
(hdf_file_path)Update the format of a HDF5 file.
write_data
(data, data_names, group_name, ...)Cache input data to avoid re evaluation.
- FILE_FORMAT_VERSION = 2¶
- HASH_TAG = 'hash'¶
- INPUTS_GROUP = 'inputs'¶
- JACOBIAN_GROUP = 'jacobian'¶
- OUTPUTS_GROUP = 'outputs'¶
- clear(hdf_node_path)[source]¶
Clear the data in the cache.
- Parameters
hdf_node_path – node path to clear
- has_group(group_number, group_name, hdf_node_path)[source]¶
Check if a group is present in the HDF file.
- Parameters
group_name – name of the group where data is written
group_number – number of the group
hdf_node_path – name of the main HDF group
- Returns
True if the group exists
- read_data(group_number, group_name, hdf_node_path, h5_open_file=None)[source]¶
Read a data dict in the hdf.
- Parameters
group_name – name of the group where data is written
group_number – number of the group :param hdf_node_path: name of the main HDF group
h5_open_file –
eventually the already opened file. this improves performance but is incompatible with multiprocess/treading
By default it is set to None.
- Returns
data dict and jacobian
- read_hashes(hashes_dict, hdf_node_path)[source]¶
Read the hashes in the HDF file.
- Parameters
hashes_dict – dict of hashes to fill
hdf_node_path – name of the main HDF group
- Returns
max_group
- classmethod update_file_format(hdf_file_path)[source]¶
Update the format of a HDF5 file.
GEMSEO 3.2.0 added a
HDF5FileSingleton.FILE_FORMAT_VERSION
to the HDF5 files, to allow handling its maintenance and evolutions. In particular, GEMSEO 3.2.0 fixed the hashing of the data dictionaries.- Parameters
hdf_file_path (Union[str, pathlib.Path]) – A HDF5 file path.
- Return type
None
- write_data(data, data_names, group_name, group_num, hdf_node_path, h5_open_file=None)[source]¶
Cache input data to avoid re evaluation.
- Parameters
data – the data to cache
data_names – list of data names
group_name – inputs or outputs or jacobian group
hdf_node_path – name of the main HDF group
h5_open_file –
eventually the already opened file. this improves performance but is incompatible with multiprocess/treading
By default it is set to None.