Statistics

The base class Statistics

Abstract class for the estimation of statistics from a dataset.

Overview

The abstract Statistics class implements the concept of statistics library. It is enriched by the EmpiricalStatistics and ParametricStatistics.

Construction

A Statistics object is built from a Dataset and optionally variables names. In this case, statistics are only computed for these variables. Otherwise, statistics are computed for all the variable available in the dataset. Lastly, the user can give a name to its Statistics object. By default, this name is the concatenation of the name of the class overloading Statistics and the name of the Dataset.

Capabilities

A Statistics returns standard descriptive and statistical measures for the different variables:

Classes:

Statistics(dataset[, variables_names, name])

Abstract class to interface a statistics library.

class gemseo.uncertainty.statistics.statistics.Statistics(dataset, variables_names=None, name=None)[source]

Abstract class to interface a statistics library.

dataset

The dataset.

Type

Dataset

n_samples

The number of samples.

Type

int

n_variables

The number of variables.

Type

int

name

The name of the object.

Type

str

# noqa: D205,D212,D415 :param dataset: A dataset. :param variables_names: The variables of interest.

Default: consider all the variables available in the dataset.

Parameters
  • name (str | None) –

    A name for the object. Default: use the concatenation of the class and dataset names.

    By default it is set to None.

  • dataset (Dataset) –

  • variables_names (Iterable[str] | None) –

    By default it is set to None.

Return type

None

Methods:

compute_a_value()

Compute the A-value \(\text{Aval}[X]\).

compute_b_value()

Compute the B-value \(\text{Bval}[X]\).

compute_expression(variable_name, statistic_name)

Return the expression of a statistical function applied to a variable.

compute_margin(std_factor)

Compute a margin \(\text{Margin}[X]=\mathbb{E}[X]+\kappa\mathbb{S}[X]\).

compute_maximum()

Compute the maximum \(\text{Max}[X]\).

compute_mean()

Compute the mean \(\mathbb{E}[X]\).

compute_mean_std(std_factor)

Compute a margin \(\text{Margin}[X]=\mathbb{E}[X]+\kappa\mathbb{S}[X]\).

compute_median()

Compute the median \(\text{Med}[X]\).

compute_minimum()

Compute the \(\text{Min}[X]\).

compute_moment(order)

Compute the n-th moment \(M[X; n]\).

compute_percentile(order)

Compute the n-th percentile \(\text{p}[X; n]\).

compute_probability(thresh[, greater])

Compute the probability related to a threshold.

compute_quantile(prob)

Compute the quantile \(\mathbb{Q}[X; \alpha]\) related to a probability.

compute_quartile(order)

Compute the n-th quartile \(q[X; n]\).

compute_range()

Compute the range \(R[X]\).

compute_standard_deviation()

Compute the standard deviation \(\mathbb{S}[X]\).

compute_tolerance_interval(coverage[, ...])

Compute a tolerance interval \(\text{TI}[X]\).

compute_variance()

Compute the variance \(\mathbb{V}[X]\).

compute_variation_coefficient()

Compute the coefficient of variation \(CoV[X]\).

compute_a_value()[source]

Compute the A-value \(\text{Aval}[X]\).

Returns

The A-value of the different variables.

Return type

dict[str, numpy.ndarray]

compute_b_value()[source]

Compute the B-value \(\text{Bval}[X]\).

Returns

The B-value of the different variables.

Return type

dict[str, numpy.ndarray]

classmethod compute_expression(variable_name, statistic_name, show_name=False, **options)[source]

Return the expression of a statistical function applied to a variable.

E.g. “P[X >= 1.0]” for the probability that X exceeds 1.0.

Parameters
  • variable_name (str) – The name of the variable, e.g. "X".

  • statistic_name (str) – The name of the statistic, e.g. "probability".

  • show_name (bool) –

    If True, show option names. Otherwise, only show option values.

    By default it is set to False.

  • **options (bool | float | int) – The options passed to the statistical function, e.g. {"greater": True, "thresh": 1.0}.

Returns

The expression of the statistical function applied to the variable.

Return type

str

compute_margin(std_factor)

Compute a margin \(\text{Margin}[X]=\mathbb{E}[X]+\kappa\mathbb{S}[X]\).

Parameters

std_factor (float) – The weight \(\kappa\) of the standard deviation.

Returns

The margin for the different variables.

Return type

dict[str, numpy.ndarray]

compute_maximum()[source]

Compute the maximum \(\text{Max}[X]\).

Returns

The maximum of the different variables.

Return type

dict[str, numpy.ndarray]

compute_mean()[source]

Compute the mean \(\mathbb{E}[X]\).

Returns

The mean of the different variables.

Return type

dict[str, numpy.ndarray]

compute_mean_std(std_factor)[source]

Compute a margin \(\text{Margin}[X]=\mathbb{E}[X]+\kappa\mathbb{S}[X]\).

Parameters

std_factor (float) – The weight \(\kappa\) of the standard deviation.

Returns

The margin for the different variables.

Return type

dict[str, numpy.ndarray]

compute_median()[source]

Compute the median \(\text{Med}[X]\).

Returns

The median of the different variables.

Return type

dict[str, numpy.ndarray]

compute_minimum()[source]

Compute the \(\text{Min}[X]\).

Returns

The minimum of the different variables.

Return type

dict[str, numpy.ndarray]

compute_moment(order)[source]

Compute the n-th moment \(M[X; n]\).

Parameters

order (int) – The order \(n\) of the moment.

Returns

The moment of the different variables.

Return type

dict[str, numpy.ndarray]

compute_percentile(order)[source]

Compute the n-th percentile \(\text{p}[X; n]\).

Parameters

order (int) – The order \(n\) of the percentile. Either 0, 1, 2, … or 100.

Returns

The percentile of the different variables.

Return type

dict[str, numpy.ndarray]

compute_probability(thresh, greater=True)[source]

Compute the probability related to a threshold.

Either \(\mathbb{P}[X \geq x]\) or \(\mathbb{P}[X \leq x]\).

Parameters
  • thresh (float) – A threshold \(x\).

  • greater (bool) –

    The type of probability. If True, compute the probability of exceeding the threshold. Otherwise, compute the opposite.

    By default it is set to True.

Returns

The probability of the different variables

Return type

dict[str, numpy.ndarray]

compute_quantile(prob)[source]

Compute the quantile \(\mathbb{Q}[X; \alpha]\) related to a probability.

Parameters

prob (float) – A probability \(\alpha\) between 0 and 1.

Returns

The quantile of the different variables.

Return type

dict[str, numpy.ndarray]

compute_quartile(order)[source]

Compute the n-th quartile \(q[X; n]\).

Parameters

order (int) – The order \(n\) of the quartile. Either 1, 2 or 3.

Returns

The quartile of the different variables.

Return type

dict[str, numpy.ndarray]

compute_range()[source]

Compute the range \(R[X]\).

Returns

The range of the different variables.

Return type

dict[str, numpy.ndarray]

compute_standard_deviation()[source]

Compute the standard deviation \(\mathbb{S}[X]\).

Returns

The standard deviation of the different variables.

Return type

dict[str, numpy.ndarray]

compute_tolerance_interval(coverage, confidence=0.95, side=ToleranceIntervalSide.BOTH)[source]

Compute a tolerance interval \(\text{TI}[X]\).

This coverage level is the minimum percentage of belonging to the TI. The tolerance interval is computed with a confidence level and can be either lower-sided, upper-sided or both-sided.

Parameters
  • coverage (float) – A minimum percentage of belonging to the TI.

  • confidence (float) –

    A level of confidence in [0,1].

    By default it is set to 0.95.

  • side (gemseo.uncertainty.statistics.tolerance_interval.distribution.ToleranceIntervalSide) –

    The type of the tolerance interval characterized by its sides of interest, either a lower-sided tolerance interval \([a, +\infty[\), an upper-sided tolerance interval \(]-\infty, b]\), or a two-sided tolerance interval \([c, d]\).

    By default it is set to BOTH.

Returns

The tolerance limits of the different variables.

Return type

dict[str, tuple[numpy.ndarray, numpy.ndarray]]

compute_variance()[source]

Compute the variance \(\mathbb{V}[X]\).

Returns

The variance of the different variables.

Return type

dict[str, numpy.ndarray]

compute_variation_coefficient()[source]

Compute the coefficient of variation \(CoV[X]\).

This is the standard deviation normalized by the expectation: \(CoV[X]=\mathbb{E}[S]/\mathbb{E}[X]\).

Returns

The coefficient of variation of the different variables.

Return type

dict[str, numpy.ndarray]

Examples