gemseo / caches

hdf5_cache module

Caching module to avoid multiple evaluations of a discipline

Classes:

HDF5Cache(hdf_file_path, hdf_node_path[, ...])

Cache using disk HDF5 file to store the data.

HDF5FileSingleton(*args, **kwargs)

Singleton to access a HDF file Used for multithreaded/multiprocessing access with a Lock.

class gemseo.caches.hdf5_cache.HDF5Cache(hdf_file_path, hdf_node_path, tolerance=0.0, name=None)[source]

Bases: gemseo.core.cache.AbstractFullCache

Cache using disk HDF5 file to store the data.

Initialize a singleton to access a HDF file. This singleton is used for multithreaded/multiprocessing access with a Lock.

Initialize cache tolerance. By default, don’t use approximate cache. It is up to the user to choose to optimize CPU time with this or not could be something like 2 * finfo(float).eps

Parameters
  • hdf_file_path (str) – Path of the HDF file.

  • hdf_node_path (str) – Node of the HDF file.

  • tolerance (float) –

    Tolerance that defines if two input vectors are equal and cached data shall be returned. If 0, no approximation is made. Default: 0.

    By default it is set to 0.0.

  • name (str) –

    Name of the cache.

    By default it is set to None.

Examples

>>> from gemseo.caches.hdf5_cache import HDF5Cache
>>> cache = HDF5Cache('my_cache.h5', 'my_node')

Attributes:

INPUTS_GROUP

JACOBIAN_GROUP

OUTPUTS_GROUP

SAMPLE_GROUP

inputs_names

Return the inputs names.

max_length

Get the maximal length of the cache (the maximal number of stored elements).

outputs_names

Return the outputs names.

samples_indices

List of samples indices.

varsizes

Return the variables sizes.

Methods:

cache_jacobian(input_data, input_names, jacobian)

Cache jacobian data to avoid re evaluation.

cache_outputs(input_data, input_names, ...)

Cache data to avoid re evaluation.

clear()

Clear the cache.

export_to_dataset([name, by_group, ...])

Set Dataset from a cache.

export_to_ggobi(file_path[, inputs_names, ...])

Export history to xml file format for ggobi tool.

get_all_data([as_iterator])

Return all the data in the cache.

get_data(index, **options)

Gets the data associated to a sample ID.

get_last_cached_inputs()

Retrieve the last execution inputs.

get_last_cached_outputs()

Retrieve the last execution outputs.

get_length()

Get the length of the cache, ie the number of stored elements.

get_outputs(input_data[, input_names])

Check if the discipline has already been evaluated for the given input data dictionary.

merge(other_cache)

Merges an other cache with self.

update_file_format(hdf_file_path)

Update the format of a HDF5 file.

INPUTS_GROUP = 'inputs'
JACOBIAN_GROUP = 'jacobian'
OUTPUTS_GROUP = 'outputs'
SAMPLE_GROUP = 'sample'
cache_jacobian(input_data, input_names, jacobian)

Cache jacobian data to avoid re evaluation.

Parameters
  • input_data (dict) – Input data to cache.

  • input_names (list(str)) – List of input data names.

  • jacobian (dict) – Jacobian to cache.

cache_outputs(input_data, input_names, output_data, output_names=None)

Cache data to avoid re evaluation.

Parameters
  • input_data (dict) – Input data to cache.

  • input_names (list(str)) – List of input data names.

  • output_data (dict) – Output data to cache.

  • output_names (list(str)) –

    List of output data names. If None, use all output names. Default: None.

    By default it is set to None.

clear()[source]

Clear the cache.

Examples

>>> from gemseo.caches.hdf5_cache import HDF5Cache
>>> from numpy import array
>>> cache = HDF5Cache('my_cache.h5', 'my_node')
>>> for index in range(5):
>>>     data = {'x': array([1.])*index, 'y': array([.2])*index}
>>>     cache.cache_outputs(data, ['x'], data, ['y'])
>>> cache.get_length()
5
>>> cache.clear()
>>> cache.get_length()
0
export_to_dataset(name=None, by_group=True, categorize=True, inputs_names=None, outputs_names=None)

Set Dataset from a cache.

Parameters
  • name (str) –

    dataset name.

    By default it is set to None.

  • by_group (bool) –

    if True, store the data by group. Otherwise, store them by variables. Default: True

    By default it is set to True.

  • categorize (bool) –

    distinguish between the different groups of variables. Default: True.

    By default it is set to True.

  • inputs_names (list(str)) –

    list of inputs names. If None, use all inputs. Default: None.

    By default it is set to None.

  • outputs_names (list(str)) –

    list of outputs names. If None, use all outputs. Default: None.

    By default it is set to None.

export_to_ggobi(file_path, inputs_names=None, outputs_names=None)

Export history to xml file format for ggobi tool.

Parameters
  • file_path (str) – Path to export the file.

  • inputs_names (list(str)) –

    List of inputs to include in the export. By default, take all of them.

    By default it is set to None.

  • outputs_names (list(str)) –

    Names of outputs to export. By default, take all of them.

    By default it is set to None.

get_all_data(as_iterator=False)

Return all the data in the cache.

Parameters

as_iterator (bool) –

If True, return an iterator. Otherwise a dictionary. Default: False.

By default it is set to False.

Returns

all_data – A dictionary of dictionaries for inputs, outputs and jacobian where keys are data indices.

Return type

dict

get_data(index, **options)[source]

Gets the data associated to a sample ID.

Parameters
  • index (str) – sample ID.

  • options – options passed to the _read_data() method.

Returns

input data, output data and jacobian.

Return type

dict

get_last_cached_inputs()

Retrieve the last execution inputs.

Returns

inputs – Last cached inputs.

Return type

dict

get_last_cached_outputs()

Retrieve the last execution outputs.

Returns

outputs – Last cached outputs.

Return type

dict

get_length()

Get the length of the cache, ie the number of stored elements.

Returns

length – Length of the cache.

Return type

int

get_outputs(input_data, input_names=None)

Check if the discipline has already been evaluated for the given input data dictionary. If True, return the associated cache, otherwise return None.

Parameters
  • input_data (dict) – Input data dictionary to test for caching.

  • input_names (list(str)) –

    List of input data names.

    By default it is set to None.

Returns

  • output_data (dict) – Output data if there is no need to evaluate the discipline. None otherwise.

  • jacobian (dict) – Jacobian if there is no need to evaluate the discipline. None otherwise.

property inputs_names

Return the inputs names.

property max_length

Get the maximal length of the cache (the maximal number of stored elements).

Returns

length – Maximal length of the cache.

Return type

int

merge(other_cache)

Merges an other cache with self.

Parameters

other_cache (AbstractFullCache) – Cache to merge with the current one.

property outputs_names

Return the outputs names.

property samples_indices

List of samples indices.

static update_file_format(hdf_file_path)[source]

Update the format of a HDF5 file.

Parameters

hdf_file_path (Union[str, pathlib.Path]) – A HDF5 file path.

Return type

None

property varsizes

Return the variables sizes.

class gemseo.caches.hdf5_cache.HDF5FileSingleton(*args, **kwargs)[source]

Bases: object

Singleton to access a HDF file Used for multithreaded/multiprocessing access with a Lock.

Constructor.

Parameters

hdf_file_path – path to the HDF5 file

Attributes:

FILE_FORMAT_VERSION

HASH_TAG

INPUTS_GROUP

JACOBIAN_GROUP

OUTPUTS_GROUP

Methods:

clear(hdf_node_path)

Clear the data in the cache.

has_group(group_number, group_name, ...)

Check if a group is present in the HDF file.

read_data(group_number, group_name, ...[, ...])

Read a data dict in the hdf.

read_hashes(hashes_dict, hdf_node_path)

Read the hashes in the HDF file.

update_file_format(hdf_file_path)

Update the format of a HDF5 file.

write_data(data, data_names, group_name, ...)

Cache input data to avoid re evaluation.

FILE_FORMAT_VERSION = 1
HASH_TAG = 'hash'
INPUTS_GROUP = 'inputs'
JACOBIAN_GROUP = 'jacobian'
OUTPUTS_GROUP = 'outputs'
clear(hdf_node_path)[source]

Clear the data in the cache.

Parameters

hdf_node_path – node path to clear

has_group(group_number, group_name, hdf_node_path)[source]

Check if a group is present in the HDF file.

Parameters
  • group_name – name of the group where data is written

  • group_number – number of the group

  • hdf_node_path – name of the main HDF group

Returns

True if the group exists

read_data(group_number, group_name, hdf_node_path, h5_open_file=None)[source]

Read a data dict in the hdf.

Parameters
  • group_name – name of the group where data is written

  • group_number – number of the group :param hdf_node_path: name of the main HDF group

  • h5_open_file

    eventually the already opened file. this improves performance but is incompatible with multiprocess/treading

    By default it is set to None.

Returns

data dict and jacobian

read_hashes(hashes_dict, hdf_node_path)[source]

Read the hashes in the HDF file.

Parameters
  • hashes_dict – dict of hashes to fill

  • hdf_node_path – name of the main HDF group

Returns

max_group

classmethod update_file_format(hdf_file_path)[source]

Update the format of a HDF5 file.

GEMSEO 3.2.0 added a HDF5FileSingleton.FILE_FORMAT_VERSION to the HDF5 files, to allow handling its maintenance and evolutions. In particular, GEMSEO 3.2.0 fixed the hashing of the data dictionaries.

Parameters

hdf_file_path (Union[str, pathlib.Path]) – A HDF5 file path.

Return type

None

write_data(data, data_names, group_name, group_num, hdf_node_path, h5_open_file=None)[source]

Cache input data to avoid re evaluation.

Parameters
  • data – the data to cache

  • data_names – list of data names

  • group_name – inputs or outputs or jacobian group

  • hdf_node_path – name of the main HDF group

  • h5_open_file

    eventually the already opened file. this improves performance but is incompatible with multiprocess/treading

    By default it is set to None.