gemseo / uncertainty / statistics

statistics module

Estimation of statistics from a dataset

Overview

The abstract Statistics class implements the concept of statistics library. It is enriched by concrete classes such as EmpiricalStatistics and ParametricStatistics.

Construction

A Statistics is built from a Dataset and optionally a list of variables names. In this case, statistics are only computed for these variables. Otherwise, statistics are computed for all variables. Lastly, the user can name its Statistics. By default, the name is the concatenation of the name of the class overloading Statistics and the name of the Dataset.

Capabilities

A Statistics returns standard descriptive and statistical measures for the different variables:

  • Statistics.minimum(): the minimum value,

  • Statistics.maximum(): the maximum value,

  • Statistics.range(): the difference between minimum and maximum values,

  • Statistics.mean(): the expectation, a.k.a. mean value,

  • Statistics.moment(): the central moment which is a the expected value of a specified integer power of the deviation from the mean,

  • Statistics.variance(): the variance, which is the mean squared variation around the mean value,

  • Statistics.standard_deviation(): the standard deviation, which is the square root of the variance,

  • Statistics.quantile(): the quantile associated with a probability, which is the cut point diving the range into a first continuous interval with this given probability and a second continuous interval with the complementary probability; common q-quantiles dividing the range into q continuous interval with equal probabilities are also implemented:

  • Statistics.probability(): the probability that the random variable is larger or smaller than a certain threshold,

  • Statistics.tolerance_interval(): the left-sided, right-sided or both-sided tolerance interval associated with a given coverage level and a given confidence level, which is a statistical interval within which, with some confidence level, a specified proportion of the random variable realizations falls (this proportion is the coverage level)

    • Statistics.a_value(): the A-value, which is the lower bound of the left-sided tolerance interval associated with a coverage level equal to 99% and a confidence level equal to 95%,

    • Statistics.b_value(): the B-value, which is the lower bound of the left-sided tolerance interval associated with a coverage level equal to 90% and a confidence level equal to 95%,

class gemseo.uncertainty.statistics.statistics.Statistics(dataset, variables_names=None, name=None)[source]

Bases: object

Abstract class for Statistics library interface.

Constructor

Parameters
  • dataset (Dataset) – dataset

  • variables_names (list(str)) – list of variables names. If None, the method considers all variables from loaded dataset. Default: None.

  • name (str) – name of the object. If None, use the concatenation of class and dataset names. Default: None.

a_value()[source]

Compute the b-value.

Returns

b-value

b_value()[source]

Compute the b-value.

Returns

b-value

maximum()[source]

Compute the maximum.

Returns

maximum

mean()[source]

Compute the mean.

Returns

mean

median()[source]

Compute the median.

Parameters

options – options

Returns

median

minimum()[source]

Compute the minimum.

Returns

minimum

moment(order)[source]

Compute the moment for a given order.

Parameters

order (int) – moment index

Returns

moment

percentile(order)[source]

Compute the percentile.

Parameters

order (int) – percentile order, e.g. 4.

Returns

percentile

probability(thresh, greater)[source]

Compute a probability associated to a threshold.

Parameters
  • thresh (float) – threshold

  • greater (bool) – if True, compute the probability the probability of exceeding the threshold, if False, compute the reverse.

Returns

probability

quantile(prob)[source]

Compute a quantile associated to a probability.

Parameters

prob (float) – probability between 0 and 1

Returns

quantile

quartile(order)[source]

Compute a quartile.

Parameters

order (int) – quartile order in [1,2,3]

Returns

quartile

range()[source]

Compute the range

Returns

range

standard_deviation()[source]

Compute a standard_deviation.

Returns

standard deviation

tolerance_interval(coverage, confidence=0.95, side='both')[source]

Compute the tolerance interval (TI) for a given minimum percentage of the population and a given confidence level.

Parameters
  • coverage (float) – minimum percentage of belonging to the TI.

  • confidence (float) – level of confidence in [0,1]. Default: 0.95.

  • side (str) – kind of interval: ‘lower’ for lower-sided TI, ‘upper’ for upper-sided TI and ‘both for both-sided TI.

Returns

tolerance limits

variance()[source]

Compute a variance.

Returns

variance