Built-in datasets

Dataset factory

This module contains a factory to instantiate a Dataset from its class name. The class can be internal to GEMSEO or located in an external module whose path is provided to the constructor. It also provides a list of available cache types and allows you to test if a cache type is available.

Burgers dataset

This Dataset contains solutions to the Burgers’ equation with periodic boundary conditions on the interval \([0, 2\pi]\) for different time steps:

\[u_t + u u_x = \nu u_{xx},\]

An analytical expression can be obtained for the solution, using the Cole-Hopf transform:

\[u(t, x) = - 2 \nu \frac{\phi'}{\phi},\]

where \(\phi\) is solution to the heat equation \(\phi_t = \nu \phi_{xx}\).

This Dataset is based on a full-factorial design of experiments. Each sample corresponds to a given time step \(t\), while each feature corresponds to a given spatial point \(x\).

More information about Burgers’ equation

Example

Iris dataset

This is one of the best known Dataset to be found in the machine learning literature.

It was introduced by the statistician Ronald Fisher in his 1936 paper “The use of multiple measurements in taxonomic problems”, Annals of Eugenics. 7 (2): 179–188.

It contains 150 instances of iris plants:

  • 50 Iris Setosa,

  • 50 Iris Versicolour,

  • 50 Iris Virginica.

Each instance is characterized by:

  • its sepal length in cm,

  • its sepal width in cm,

  • its petal length in cm,

  • its petal width in cm.

This Dataset can be used for either clustering purposes or classification ones.

More information about the Iris dataset

Example

Rosenbrock dataset

This Dataset contains 100 evaluations of the well-known Rosenbrock function:

\[f(x,y)=(1-x)^2+100(y-x^2)^2\]

This function is known for its global minimum at point (1,1), its banana valley and the difficulty to reach its minimum.

This Dataset is based on a full-factorial design of experiments.

More information about the Rosenbrock function

Example