gemseo.post.dataset.scatter_plot_matrix module#

Draw a scatter matrix from a Dataset.

The ScatterMatrix class implements the scatter plot matrix, which is a way to visualize \(n\) samples of a multi-dimensional vector

\[x=(x_1,x_2,\ldots,x_d)\in\mathbb{R}^d\]

in several 2D subplots where the (i,j) subplot represents the cloud of points

\[\left(x_i^{(k)},x_j^{(k)}\right)_{1\leq k \leq n}\]

while the (i,i) subplot represents the empirical distribution of the samples

\[x_i^{(1)},\ldots,x_i^{(n)}\]

by means of an histogram or a kernel density estimator.

A variable name can be passed to the DatasetPlot.execute() method by means of the classifier keyword in order to color the curves according to the value of the variable name. This is useful when the data is labeled.

class ScatterMatrix(dataset, variable_names=(), classifier='', kde=False, size=25, marker='o', plot_lower=True, plot_upper=True, trend=Trend.NONE, **options)[source]#

Bases: DatasetPlot

Scatter plot matrix.

Parameters:
  • dataset (Dataset) -- The dataset containing the data to plot.

  • variable_names (Iterable[str]) --

    The names of the variables to consider. If empty, consider all the variables of the dataset.

    By default it is set to ().

  • classifier (str) --

    The name of the variable to group data. If empty, do not group data.

    By default it is set to "".

  • kde (bool) --

    The type of the distribution representation. If True, plot kernel-density estimator on the diagonal. Otherwise, use histograms.

    By default it is set to False.

  • size (int) --

    The size of the points.

    By default it is set to 25.

  • marker (str) --

    The marker for the points.

    By default it is set to "o".

  • plot_lower (bool) --

    Whether to plot the lower part.

    By default it is set to True.

  • plot_upper (bool) --

    Whether to plot the upper part.

    By default it is set to True.

  • trend (Trend | TrendFunctionCreator) --

    The trend function to be added on the scatter plots or a function creating a trend function from a set of xy-points.

    By default it is set to "none".

  • **options (Any) -- The options of the underlying pandas scatter matrix.

Raises:

ValueError -- If the dataset is empty.

class Trend(value)#

Bases: StrEnum

A type of trend.

CUBIC = 'cubic'#
LINEAR = 'linear'#
NONE = 'none'#
QUADRATIC = 'quadratic'#
RBF = 'rbf'#