gemseo.datasets.dataset module#

A generic data structure with entries and variables.

The concept of dataset is a key element for machine learning, post-processing, data analysis, ...

A Dataset is a pandas MultiIndex DataFrame storing series of data representing the values of multidimensional features belonging to different groups of features.

A Dataset can be set either from a file (from_csv() and from_txt()) or from a NumPy array (from_array()), and can be enriched from a group of variables (add_group()) or from a single variable (add_variable()).

An BaseFullCache or an OptimizationProblem can also be exported to a Dataset using the methods BaseFullCache.to_dataset() and OptimizationProblem.to_dataset().

class Dataset(data=None, index=None, columns=None, dtype=None, copy=None, *, dataset_name='')[source]#

Bases: DataFrame

A generic data structure with entries and variables.

A variable is defined by a name and a number of components. For instance, the variable "x" can have 2 components: 0 and 1. Or the variable y can have 4 components: "a", "b", "c" and "d".

A variable belongs to a group of variables (default: DEFAULT_GROUP). Two variables can have the same name; only the tuple (group_name, variable_name) is unique and is therefore called a variable identifier.

Based on a set of variable identifiers, Dataset is a collection of entries corresponding to a set of variable identifiers. An entry corresponds to an index of the Dataset.

A Dataset is a special pandas DataFrame with the multi-index (group_name, variable_name, component). It must be built from the methods add_variable(), add_group(), from_array(), from_txt(), from_csv() and from_dataframe().

Miscellaneous information that is not specific to an entry of the dataset can be stored in the dictionary misc, as dataset.misc["year"] = 2023.

Warning

A Dataset behaves like any multi-index DataFrame but its instantiation using the constructor dataset = Dataset(data, ...) can lead to some inconsistencies (multi-index levels, index values, dtypes, ...). Hence, the construction from the dedicated methods is recommended, e.g. dataset = Dataset(); dataset.add_variable("x", data).

gemseo.datasets.dataset module#

This Page