Introduction to dataset

Dataset

The dataset module implements the concept of dataset which is a key element for machine learning, post-processing, data analysis, …

A Dataset is an object defined by data stored as a dictionary of 2D numpy arrays, whose rows are samples, a.k.a. realizations, and columns are features, a.k.a. parameters or variables. The indices of this dictionary are either names of groups of variables or names of variables. A Dataset is also defined by a list of variables names, a dictionary of variables sizes and a dictionary of variables groups.

A Dataset can be set either from a numpy array or a file. An AbstractFullCache or an OptimizationProblem can also be exported to a Dataset using AbstractFullCache.export_to_dataset() and OptimizationProblem.export_to_dataset() respectively.

From a Dataset, we can easily access its length and get the data, either as 2D array or as dictionaries indexed by the variables names. We can get either the whole data, data associated to a group or data associated to a list of variables. It is also possible to export the Dataset to an AbstractFullCache or a pandas DataFrame.