Dataset

Basics

A generic dataset to store data in memory.

The concept of dataset is a key element for machine learning, post-processing, data analysis, …

A Dataset uses its attribute Dataset.data to store \(N\) series of data representing the values of \(p\) multidimensional features belonging to different groups of features.

This attribute Dataset.data is a dictionary of 2D numpy arrays, whose rows are the samples, a.k.a. series, realizations or entries, and columns are the variables, a.k.a. parameters or features. The keys of this dictionary are either the names of the groups of variables or the names of the variables. Thus, a Dataset is not only defined by the raw data stored but also by the names, the sizes and the groups of the different variables.

A Dataset can be set either from a file (Dataset.set_from_file()) or from a numpy arrays (Dataset.set_from_array()), and can be enriched from a group of variables (Dataset.add_group()) or from a single variable (Dataset.add_variable()).

An AbstractFullCache or an OptimizationProblem can also be exported to a Dataset using AbstractFullCache.export_to_dataset() and OptimizationProblem.export_to_dataset() respectively.

From a Dataset, we can easily access its length and data, either as 2D array or as dictionaries indexed by the variables names. We can get either the whole data, or the data associated to a group or the data associated to a list of variables. It is also possible to export the Dataset to an AbstractFullCache or a pandas DataFrame.

Visualization

Datasets post-processing.

A DatasetPlot is a generic graphical representation of a Dataset that can be displayed on screen or saved to a file. The different visualization tools (OptPostProcessor, SensitivityAnalysis.plot(), …) proposed by GEMSEO rely more and more on it to factorize the code, separate data and graph generation and standardize the visualizations.

As an example, RadarChart implements a radar chart and is used by both RadarChart to visualize the constraints stored in a Database and SensitivityAnalysis.plot_radar() to visualize the sensitivity indices generated by a SensitivityAnalysis.

Use cases

Examples of dataset.

GEMSEO proposes several datasets containing academic data to illustrate its capabilities:

Examples

Creation

Dataset from a cache

Dataset from a cache

Dataset from a cache
Dataset from a numpy array

Dataset from a numpy array

Dataset from a numpy array
Dataset from an optimization problem

Dataset from an optimization problem

Dataset from an optimization problem

Use cases

Burgers dataset

Burgers dataset

Burgers dataset
Iris dataset

Iris dataset

Iris dataset
Rosenbrock dataset

Rosenbrock dataset

Rosenbrock dataset

Visualization

Andrews curves

Andrews curves

Andrews curves
Bars

Bars

Bars
Boxplot

Boxplot

Boxplot
Lines

Lines

Lines
Parallel coordinates

Parallel coordinates

Parallel coordinates
Radar chart

Radar chart

Radar chart
Scatter

Scatter

Scatter
Scatter matrix

Scatter matrix

Scatter matrix
YvsX

YvsX

YvsX
ZvsXY

ZvsXY

ZvsXY

Gallery generated by Sphinx-Gallery