Plotting a dataset¶
Dataset plot factory¶
The module factory
contains
the DatasetPlotFactory
class which is a factory
to instantiate a DatasetPlot
from its class name.
The class can be internal to GEMSEO or located in an external module whose path
is provided to the constructor. It also provides a list of available cache
types and allows you to test if a cache type is available.
Abstract dataset plot¶
The dataset_plot
module implements the abstract
DatasetPlot
class
whose purpose is to build a graphical representation
of a Dataset
and to display it on screen or save it to a file.
This abstract class has to be overloaded by concrete ones implementing
at least method DatasetPlot._run()
.
Andrews curves¶
The AndrewsCurves
class implements the Andrew plot,
a.k.a. Andrews curves,
which is a way to visualize \(n\) samples of a high-dimensional vector
in a 2D referential by projecting each sample
onto the vector
which is composed of the \(d\) first elements of the Fourier series:
Each curve \(t\mapsto f_i(t)\) is plotted over the interval \([-\pi,\pi]\) and structure in the data may be visible in these \(n\) Andrews curves.
A variable name can be passed to the DatasetPlot.execute()
method
by means of the classifier
keyword in order to color the curves
according to the value of the variable name. This is useful when the data is
labeled.
Curve plot¶
A Curves
plot represents samples of a functional variable
\(y(x)\) discretized over a 1D mesh. Both evaluations of \(y\)
and mesh are stored in a Dataset
, \(y\) as a parameter
and the mesh as a metadata.
Parallel coordinates plot¶
The ParallelCoordinates
class implements the parallel coordinates
plot, a.k.a. cowebplot, which is a way to visualize \(n\) samples of a
high-dimensional vector
in a 2D referential by representing each sample
as a piece-wise line where the x-values of the nodes from left to right are the values of \(x_1\), \(x_2\), … and \(x_d^{(i)}\).
A variable name is required by the DatasetPlot.execute()
method
by means of the classifier
keyword in order to color the curves
according to the value of the variable name. This is useful when the data is
labeled or when we are looking for the samples for which the classifier value
is comprised in some interval specified by the lower
and upper
arguments
(default values are set to -inf
and inf
respectively).
In the latter case, the color scale is composed of only two values: one for
the samples positively classified and one for the others.
Radar visualization¶
The Radar
class implements the Radviz plot,
which is a way to visualize \(n\) samples of a multi-dimensional vector
in a 2D referential and to highlight the separability of the data.
For that, each sample
is rendered inside the unit disc with the influences of the different parameters evenly distributed on its circumference. Each parameter influence varies from 0 to 1 and can be interpreted compared to the others.
A variable name is required by the DatasetPlot.execute()
method
by means of the classifier
keyword in order to color the curves
according to the value of the variable name. This is useful when the data is
labeled or when we are looking for the samples for which the classifier value
is comprised in some interval specified by the lower
and upper
arguments
(default values are set to -inf
and inf
respectively).
In the latter case, the color scale is composed of only two values: one for
the samples positively classified and one for the others.
Scatter matrix¶
The ScatterMatrix
class implements the scatter plot matrix,
which is a way to visualize \(n\) samples of a
multi-dimensional vector
in several 2D subplots where the (i,j) subplot represents the cloud of points
while the (i,i) subplot represents the empirical distribution of the samples
by means of an histogram or a kernel density estimator.
A variable name can be passed to the DatasetPlot.execute()
method
by means of the classifier
keyword in order to color the curves
according to the value of the variable name. This is useful when the data is
labeled.
Scatter plot¶
A Scatter
plot represents a set of points
\(\{x_i,y_i\}_{1\leq i \leq n}\) as markers on a classical plot
where the color of points can be heterogeneous.
Surface plot¶
A Surfaces
plot represents samples of a functional variable
\(z(x,y)\) discretized over a 2D mesh. Both evaluations of \(z\)
and mesh are stored in a Dataset
, \(z\) as a parameter
and the mesh as a metadata.