A generic dataset to store data in memory.
The concept of dataset is a key element for machine learning, post-processing, data analysis, …
Dataset uses its attribute
to store \(N\) series of data
representing the values of \(p\) multidimensional features
belonging to different groups of features.
Dataset.data is a dictionary of 2D numpy arrays,
whose rows are the samples, a.k.a. series, realizations or entries,
and columns are the variables, a.k.a. parameters or features.
The keys of this dictionary are
either the names of the groups of variables
or the names of the variables.
Dataset is not only defined by the raw data stored
but also by the names, the sizes and the groups of the different variables.
Dataset can be set
either from a file (
or from a numpy arrays (
and can be enriched from a group of variables (
or from a single variable (
AbstractFullCache or an
can also be exported to a
we can easily access its length and data,
either as 2D array or as dictionaries indexed by the variables names.
We can get either the whole data,
or the data associated to a group or the data associated to a list of variables.
It is also possible to export the
AbstractFullCache or a pandas DataFrame.
DatasetPlot is a generic graphical representation of a
that can be displayed on screen or saved to a file.
The different visualization tools
proposed by GEMSEO rely more and more on it
to factorize the code,
separate data and graph generation
and standardize the visualizations.
As an example,
RadarChart implements a
and is used by both
to visualize the constraints stored in a
to visualize the sensitivity indices generated by a
Examples of dataset.
GEMSEO proposes several datasets containing academic data to illustrate its capabilities:
IrisDatasetis a collection of iris plants, mainly used to benchmark clustering and classification algorithms,
RosenbrockDatasetis a set of evaluations of the Rosenbrock function over a regular grid, initially introduced to illustrate visualization tools dedicated to surfaces such as
BurgersDatasetis a set of solutions of the Burgers’ equation at given times, initially introduced to illustrate dimension reduction methods, e.g.
Dataset from an optimization problem