# Statistics¶

## The base class Statistics¶

Abstract class for the estimation of statistics from a dataset.

### Overview¶

The abstract Statistics class implements the concept of statistics library. It is enriched by the EmpiricalStatistics and ParametricStatistics.

### Construction¶

A Statistics object is built from a Dataset and optionally variables names. In this case, statistics are only computed for these variables. Otherwise, statistics are computed for all the variable available in the dataset. Lastly, the user can give a name to its Statistics object. By default, this name is the concatenation of the name of the class overloading Statistics and the name of the Dataset.

### Capabilities¶

A Statistics returns standard descriptive and statistical measures for the different variables:

class gemseo.uncertainty.statistics.statistics.Statistics(dataset, variables_names=None, name=None)[source]

A toolbox to compute statistics.

Unless otherwise stated, the statistics are computed variable-wise and component- wise, i.e. variable-by-variable and component-by-component. So, for the sake of readability, the methods named as compute_statistic() return dict[str, ndarray] objects whose values are the names of the variables and the values are the statistic estimated for the different component.

Parameters:
• dataset (Dataset) – A dataset.

• variables_names (Iterable[str] | None) – The variables of interest. Default: consider all the variables available in the dataset.

• name (str) – A name for the object. Default: use the concatenation of the class and dataset names.

compute_a_value()[source]

Compute the A-value $$\text{Aval}[X]$$.

The A-value is the lower bound of the left-sided tolerance interval associated with a coverage level equal to 99% and a confidence level equal to 95%.

Returns:

The component-wise A-value of the different variables.

Return type:
compute_b_value()[source]

Compute the B-value $$\text{Bval}[X]$$.

The B-value is the lower bound of the left-sided tolerance interval associated with a coverage level equal to 90% and a confidence level equal to 95%.

Returns:

The component-wise B-value of the different variables.

Return type:
classmethod compute_expression(variable_name, statistic_name, show_name=False, **options)[source]

Return the expression of a statistical function applied to a variable.

E.g. “P[X >= 1.0]” for the probability that X exceeds 1.0.

Parameters:
• variable_name (str) – The name of the variable, e.g. "X".

• statistic_name (str) – The name of the statistic, e.g. "probability".

• show_name (bool) –

If True, show option names. Otherwise, only show option values.

By default it is set to False.

• **options (bool | float | int) – The options passed to the statistical function, e.g. {"greater": True, "thresh": 1.0}.

Returns:

The expression of the statistical function applied to the variable.

Return type:

str

abstract compute_joint_probability(thresh, greater=True)[source]

Compute the joint probability related to a threshold.

Either $$\mathbb{P}[X \geq x]$$ or $$\mathbb{P}[X \leq x]$$.

Parameters:
• thresh (Mapping[str, float | ndarray]) – A threshold $$x$$ per variable.

• greater (bool) –

The type of probability. If True, compute the probability of exceeding the threshold. Otherwise, compute the opposite.

By default it is set to True.

Returns:

The joint probability of the different variables (by definition of the joint probability, this statistics is not computed component-wise).

Return type:

dict[str, float]

compute_margin(std_factor)[source]

Compute a margin $$\text{Margin}[X]=\mathbb{E}[X]+\kappa\mathbb{S}[X]$$.

Parameters:

std_factor (float) – The weight $$\kappa$$ of the standard deviation.

Returns:

The component-wise margin for the different variables.

Return type:
abstract compute_maximum()[source]

Compute the maximum $$\text{Max}[X]$$.

Returns:

The component-wise maximum of the different variables.

Return type:
abstract compute_mean()[source]

Compute the mean $$\mathbb{E}[X]$$.

Returns:

The component-wise mean of the different variables.

Return type:
compute_mean_std(std_factor)

Compute a margin $$\text{Margin}[X]=\mathbb{E}[X]+\kappa\mathbb{S}[X]$$.

Parameters:

std_factor (float) – The weight $$\kappa$$ of the standard deviation.

Returns:

The component-wise margin for the different variables.

Return type:
compute_median()[source]

Compute the median $$\text{Med}[X]$$.

Returns:

The component-wise median of the different variables.

Return type:
abstract compute_minimum()[source]

Compute the $$\text{Min}[X]$$.

Returns:

The component-wise minimum of the different variables.

Return type:
abstract compute_moment(order)[source]

Compute the n-th moment $$M[X; n]$$.

Parameters:

order (int) – The order $$n$$ of the moment.

Returns:

The component-wise moment of the different variables.

Return type:
compute_percentile(order)[source]

Compute the n-th percentile $$\text{p}[X; n]$$.

Parameters:

order (int) – The order $$n\in\{0,1,2,...100\}$$ of the percentile.

Returns:

The component-wise percentile of the different variables.

Raises:

ValueError – When $$n\notin\{0,1,2,...100\}$$.

Return type:
abstract compute_probability(thresh, greater=True)[source]

Compute the probability related to a threshold.

Either $$\mathbb{P}[X \geq x]$$ or $$\mathbb{P}[X \leq x]$$.

Parameters:
• thresh (Mapping[str, float | ndarray]) – A threshold $$x$$ per variable.

• greater (bool) –

The type of probability. If True, compute the probability of exceeding the threshold. Otherwise, compute the opposite.

By default it is set to True.

Returns:

The component-wise probability of the different variables.

Return type:

dict[str, ndarray]

abstract compute_quantile(prob)[source]

Compute the quantile $$\mathbb{Q}[X; \alpha]$$ related to a probability.

Parameters:

prob (float) – A probability $$\alpha$$ between 0 and 1.

Returns:

The component-wise quantile of the different variables.

Return type:
compute_quartile(order)[source]

Compute the n-th quartile $$q[X; n]$$.

Parameters:

order (int) – The order $$n\in\{1,2,3\}$$ of the quartile.

Returns:

The component-wise quartile of the different variables.

Raises:

ValueError – When $$n\notin\{1,2,3\}$$.

Return type:
abstract compute_range()[source]

Compute the range $$R[X]$$.

Returns:

The component-wise range of the different variables.

Return type:
abstract compute_standard_deviation()[source]

Compute the standard deviation $$\mathbb{S}[X]$$.

Returns:

The component-wise standard deviation of the different variables.

Return type:
compute_tolerance_interval(coverage, confidence=0.95, side=ToleranceIntervalSide.BOTH)[source]

Compute a $$(p,1-\alpha)$$ tolerance interval $$\text{TI}[X]$$.

The tolerance interval $$\text{TI}[X]$$ is defined to contain at least a proportion $$p$$ of the values of $$X$$ with a level of confidence $$1-\alpha$$. $$p$$ is also called the coverage level of the TI.

Typically, $$\alpha=0.05$$ or equivalently $$1-\alpha=0.95$$.

The tolerance interval can be either

• lower-sided (side="LOWER": $$[L, +\infty[$$),

• upper-sided (side="UPPER": $$]-\infty, U]$$) or

• both-sided (side="BOTH": $$[L, U]$$).

Parameters:
• coverage (float) – A minimum proportion $$p\in[0,1]$$ of belonging to the TI.

• confidence (float) –

A level of confidence $$1-\alpha\in[0,1]$$.

By default it is set to 0.95.

• side (ToleranceIntervalSide) –

The type of the tolerance interval.

By default it is set to BOTH.

Returns:

The component-wise tolerance intervals of the different variables, expressed as {variable_name: [(lower_bound, upper_bound), ...], ... } where [(lower_bound, upper_bound), ...] are the lower and upper bounds of the tolerance interval of the different components of variable_name.

Return type:
abstract compute_variance()[source]

Compute the variance $$\mathbb{V}[X]$$.

Returns:

The component-wise variance of the different variables.

Return type:
compute_variation_coefficient()[source]

Compute the coefficient of variation $$CoV[X]$$.

This is the standard deviation normalized by the expectation: $$CoV[X]=\mathbb{E}[S]/\mathbb{E}[X]$$.

Returns:

The component-wise coefficient of variation of the different variables.

Return type:
dataset: Dataset

The dataset.

n_samples: int

The number of samples.

n_variables: int

The number of variables.

name: str

The name of the object.

## Examples¶

See the examples about sensitivity analysis.