gemseo / caches

hdf5_cache module¶

Caching module to avoid multiple evaluations of a discipline¶

Classes:

`HDF5Cache`(hdf_file_path, hdf_node_path[, ...])	Cache using disk HDF5 file to store the data.
`HDF5FileSingleton`(args, *kwargs)	Singleton to access a HDF file Used for multithreaded/multiprocessing access with a Lock.

class gemseo.caches.hdf5_cache.HDF5Cache(hdf_file_path, hdf_node_path, tolerance=0.0, name=None)[source]¶

Bases: gemseo.core.cache.AbstractFullCache

Cache using disk HDF5 file to store the data.

Initialize a singleton to access a HDF file. This singleton is used for multithreaded/multiprocessing access with a Lock.

Initialize cache tolerance. By default, don’t use approximate cache. It is up to the user to choose to optimize CPU time with this or not could be something like 2 * finfo(float).eps

Parameters

hdf_file_path (str) – Path of the HDF file.
hdf_node_path (str) – Node of the HDF file.
tolerance (float) –
Tolerance that defines if two input vectors are equal and cached data shall be returned. If 0, no approximation is made. Default: 0.

By default it is set to 0.0.
name (str) –
Name of the cache.

By default it is set to None.

Examples

>>> from gemseo.caches.hdf5_cache import HDF5Cache
>>> cache = HDF5Cache('my_cache.h5', 'my_node')

Attributes:

`INPUTS_GROUP`
`JACOBIAN_GROUP`
`OUTPUTS_GROUP`
`SAMPLE_GROUP`
`inputs_names`	Return the inputs names.
`max_length`	Get the maximal length of the cache (the maximal number of stored elements).
`outputs_names`	Return the outputs names.
`samples_indices`	List of samples indices.
`varsizes`	Return the variables sizes.

Methods:

`cache_jacobian`(input_data, input_names, jacobian)	Cache jacobian data to avoid re evaluation.
`cache_outputs`(input_data, input_names, ...)	Cache data to avoid re evaluation.
`clear`()	Clear the cache.
`export_to_dataset`([name, by_group, ...])	Set Dataset from a cache.
`export_to_ggobi`(file_path[, inputs_names, ...])	Export history to xml file format for ggobi tool.
`get_all_data`([as_iterator])	Return all the data in the cache.
`get_data`(index, **options)	Gets the data associated to a sample ID.
`get_last_cached_inputs`()	Retrieve the last execution inputs.
`get_last_cached_outputs`()	Retrieve the last execution outputs.
`get_length`()	Get the length of the cache, ie the number of stored elements.
`get_outputs`(input_data[, input_names])	Check if the discipline has already been evaluated for the given input data dictionary.
`merge`(other_cache)	Merges an other cache with self.
`update_file_format`(hdf_file_path)	Update the format of a HDF5 file.

INPUTS_GROUP = 'inputs'¶

JACOBIAN_GROUP = 'jacobian'¶

OUTPUTS_GROUP = 'outputs'¶

SAMPLE_GROUP = 'sample'¶

cache_jacobian(input_data, input_names, jacobian)¶

Cache jacobian data to avoid re evaluation.

Parameters

input_data (dict) – Input data to cache.
input_names (list(str)) – List of input data names.
jacobian (dict) – Jacobian to cache.

cache_outputs(input_data, input_names, output_data, output_names=None)¶

Cache data to avoid re evaluation.

Parameters

input_data (dict) – Input data to cache.
input_names (list(str)) – List of input data names.
output_data (dict) – Output data to cache.
output_names (list(str)) –
List of output data names. If None, use all output names. Default: None.

By default it is set to None.

clear()[source]¶

Clear the cache.

Examples

>>> from gemseo.caches.hdf5_cache import HDF5Cache
>>> from numpy import array
>>> cache = HDF5Cache('my_cache.h5', 'my_node')
>>> for index in range(5):
>>>     data = {'x': array([1.])*index, 'y': array([.2])*index}
>>>     cache.cache_outputs(data, ['x'], data, ['y'])
>>> cache.get_length()
5
>>> cache.clear()
>>> cache.get_length()
0

export_to_dataset(name=None, by_group=True, categorize=True, inputs_names=None, outputs_names=None)¶

Set Dataset from a cache.

Parameters

name (str) –
dataset name.

By default it is set to None.
by_group (bool) –
if True, store the data by group. Otherwise, store them by variables. Default: True

By default it is set to True.
categorize (bool) –
distinguish between the different groups of variables. Default: True.

By default it is set to True.
inputs_names (list(str)) –
list of inputs names. If None, use all inputs. Default: None.

By default it is set to None.
outputs_names (list(str)) –
list of outputs names. If None, use all outputs. Default: None.

By default it is set to None.

export_to_ggobi(file_path, inputs_names=None, outputs_names=None)¶

Export history to xml file format for ggobi tool.

Parameters

file_path (str) – Path to export the file.
inputs_names (list(str)) –
List of inputs to include in the export. By default, take all of them.

By default it is set to None.
outputs_names (list(str)) –
Names of outputs to export. By default, take all of them.

By default it is set to None.

get_all_data(as_iterator=False)¶

Return all the data in the cache.

Parameters

as_iterator (bool) –

If True, return an iterator. Otherwise a dictionary. Default: False.

By default it is set to False.

Returns

all_data – A dictionary of dictionaries for inputs, outputs and jacobian where keys are data indices.

Return type

dict

get_data(index, **options)[source]¶

Gets the data associated to a sample ID.

Parameters

index (str) – sample ID.
options – options passed to the _read_data() method.

Returns

input data, output data and jacobian.

Return type

dict

get_last_cached_inputs()¶

Retrieve the last execution inputs.

Returns: inputs – Last cached inputs.
Return type: dict

get_last_cached_outputs()¶

Retrieve the last execution outputs.

Returns: outputs – Last cached outputs.
Return type: dict

get_length()¶

Get the length of the cache, ie the number of stored elements.

Returns: length – Length of the cache.
Return type: int

get_outputs(input_data, input_names=None)¶

Check if the discipline has already been evaluated for the given input data dictionary. If True, return the associated cache, otherwise return None.

Parameters

input_data (dict) – Input data dictionary to test for caching.
input_names (list(str)) –
List of input data names.

By default it is set to None.

Returns

output_data (dict) – Output data if there is no need to evaluate the discipline. None otherwise.
jacobian (dict) – Jacobian if there is no need to evaluate the discipline. None otherwise.

property inputs_names¶: Return the inputs names.

property max_length¶

Get the maximal length of the cache (the maximal number of stored elements).

Returns: length – Maximal length of the cache.
Return type: int

merge(other_cache)¶

Merges an other cache with self.

Parameters: other_cache (AbstractFullCache) – Cache to merge with the current one.

property outputs_names¶: Return the outputs names.

property samples_indices¶: List of samples indices.

static update_file_format(hdf_file_path)[source]¶

Update the format of a HDF5 file.

Parameters: hdf_file_path (Union[str, pathlib.Path]) – A HDF5 file path.
Return type: None

property varsizes¶: Return the variables sizes.

class gemseo.caches.hdf5_cache.HDF5FileSingleton(*args, **kwargs)[source]¶

Bases: object

Singleton to access a HDF file Used for multithreaded/multiprocessing access with a Lock.

Constructor.

Parameters: hdf_file_path – path to the HDF5 file

Attributes:

`FILE_FORMAT_VERSION`
`HASH_TAG`
`INPUTS_GROUP`
`JACOBIAN_GROUP`
`OUTPUTS_GROUP`

Methods:

`clear`(hdf_node_path)	Clear the data in the cache.
`has_group`(group_number, group_name, ...)	Check if a group is present in the HDF file.
`read_data`(group_number, group_name, ...[, ...])	Read a data dict in the hdf.
`read_hashes`(hashes_dict, hdf_node_path)	Read the hashes in the HDF file.
`update_file_format`(hdf_file_path)	Update the format of a HDF5 file.
`write_data`(data, data_names, group_name, ...)	Cache input data to avoid re evaluation.

FILE_FORMAT_VERSION = 1¶

HASH_TAG = 'hash'¶

INPUTS_GROUP = 'inputs'¶

JACOBIAN_GROUP = 'jacobian'¶

OUTPUTS_GROUP = 'outputs'¶

clear(hdf_node_path)[source]¶

Clear the data in the cache.

Parameters: hdf_node_path – node path to clear

has_group(group_number, group_name, hdf_node_path)[source]¶

Check if a group is present in the HDF file.

Parameters

group_name – name of the group where data is written
group_number – number of the group
hdf_node_path – name of the main HDF group

Returns

True if the group exists

read_data(group_number, group_name, hdf_node_path, h5_open_file=None)[source]¶

Read a data dict in the hdf.

Parameters

group_name – name of the group where data is written
group_number – number of the group :param hdf_node_path: name of the main HDF group
h5_open_file –
eventually the already opened file. this improves performance but is incompatible with multiprocess/treading

By default it is set to None.

Returns

data dict and jacobian

read_hashes(hashes_dict, hdf_node_path)[source]¶

Read the hashes in the HDF file.

Parameters

hashes_dict – dict of hashes to fill
hdf_node_path – name of the main HDF group

Returns

max_group

classmethod update_file_format(hdf_file_path)[source]¶

Update the format of a HDF5 file.

GEMSEO 3.2.0 added a HDF5FileSingleton.FILE_FORMAT_VERSION to the HDF5 files, to allow handling its maintenance and evolutions. In particular, GEMSEO 3.2.0 fixed the hashing of the data dictionaries.

Parameters: hdf_file_path (Union[str, pathlib.Path]) – A HDF5 file path.
Return type: None

write_data(data, data_names, group_name, group_num, hdf_node_path, h5_open_file=None)[source]¶

Cache input data to avoid re evaluation.

Parameters

data – the data to cache
data_names – list of data names
group_name – inputs or outputs or jacobian group
hdf_node_path – name of the main HDF group
h5_open_file –
eventually the already opened file. this improves performance but is incompatible with multiprocess/treading

By default it is set to None.