Introduction to dataset¶
Dataset¶
The dataset
module implements the concept of dataset
which is a key element for machine learning, post-processing,
data analysis, …
A Dataset
is an object
defined by data stored as a dictionary of 2D numpy arrays,
whose rows are samples, a.k.a. realizations, and columns are features,
a.k.a. parameters or variables. The indices of this dictionary are either
names of groups of variables or names of variables.
A Dataset
is also defined by
a list of variables names, a dictionary of variables sizes
and a dictionary of variables groups.
A Dataset
can be set either from a numpy array or a file.
An AbstractFullCache
or an OptimizationProblem
can also be exported to a Dataset
using AbstractFullCache.export_to_dataset()
and OptimizationProblem.export_to_dataset()
respectively.
From a Dataset
, we can easily access its length and get the data,
either as 2D array or as dictionaries indexed by the variables names.
We can get either the whole data,
data associated to a group or data associated to a list of variables.
It is also possible to export the Dataset
to an AbstractFullCache
or a pandas DataFrame.