statistics module¶
Estimation of statistics from a dataset¶
Overview¶
The abstract Statistics
class implements the concept of
statistics library. It is enriched by concrete classes
such as EmpiricalStatistics
and ParametricStatistics
.
Construction¶
A Statistics
is built from a Dataset
and optionally
a list of variables names. In this case, statistics are only computed
for these variables. Otherwise, statistics are computed for all variables.
Lastly, the user can name its Statistics
. By default,
the name is the concatenation of the name of the class
overloading Statistics
and the name of the Dataset
.
Capabilities¶
A Statistics
returns standard descriptive and statistical measures
for the different variables:
Statistics.minimum()
: the minimum value,Statistics.maximum()
: the maximum value,Statistics.range()
: the difference between minimum and maximum values,Statistics.mean()
: the expectation, a.k.a. mean value,Statistics.moment()
: the central moment which is a the expected value of a specified integer power of the deviation from the mean,Statistics.variance()
: the variance, which is the mean squared variation around the mean value,Statistics.standard_deviation()
: the standard deviation, which is the square root of the variance,Statistics.quantile()
: the quantile associated with a probability, which is the cut point diving the range into a first continuous interval with this given probability and a second continuous interval with the complementary probability; common q-quantiles dividing the range into q continuous interval with equal probabilities are also implemented:Statistics.median()
which implements the 2-quantile (50%).Statistics.quartile()
whose order (1, 2 or 3) implements the 4-quantiles (respectively 25%, 50% and 75%),Statistics.percentile()
whose order (1, 2, …, 99) implements the 100-quantiles (1%, 2%, …, 99%),
Statistics.probability()
: the probability that the random variable is larger or smaller than a certain threshold,Statistics.tolerance_interval()
: the left-sided, right-sided or both-sided tolerance interval associated with a given coverage level and a given confidence level, which is a statistical interval within which, with some confidence level, a specified proportion of the random variable realizations falls (this proportion is the coverage level)Statistics.a_value()
: the A-value, which is the lower bound of the left-sided tolerance interval associated with a coverage level equal to 99% and a confidence level equal to 95%,Statistics.b_value()
: the B-value, which is the lower bound of the left-sided tolerance interval associated with a coverage level equal to 90% and a confidence level equal to 95%,
-
class
gemseo.uncertainty.statistics.statistics.
Statistics
(dataset, variables_names=None, name=None)[source]¶ Bases:
object
Abstract class for Statistics library interface.
Constructor
- Parameters
dataset (Dataset) – dataset
variables_names (list(str)) – list of variables names. If None, the method considers all variables from loaded dataset. Default: None.
name (str) – name of the object. If None, use the concatenation of class and dataset names. Default: None.
-
moment
(order)[source]¶ Compute the moment for a given order.
- Parameters
order (int) – moment index
- Returns
moment
-
percentile
(order)[source]¶ Compute the percentile.
- Parameters
order (int) – percentile order, e.g. 4.
- Returns
percentile
-
probability
(thresh, greater)[source]¶ Compute a probability associated to a threshold.
- Parameters
thresh (float) – threshold
greater (bool) – if True, compute the probability the probability of exceeding the threshold, if False, compute the reverse.
- Returns
probability
-
quantile
(prob)[source]¶ Compute a quantile associated to a probability.
- Parameters
prob (float) – probability between 0 and 1
- Returns
quantile
-
quartile
(order)[source]¶ Compute a quartile.
- Parameters
order (int) – quartile order in [1,2,3]
- Returns
quartile
-
tolerance_interval
(coverage, confidence=0.95, side='both')[source]¶ Compute the tolerance interval (TI) for a given minimum percentage of the population and a given confidence level.
- Parameters
coverage (float) – minimum percentage of belonging to the TI.
confidence (float) – level of confidence in [0,1]. Default: 0.95.
side (str) – kind of interval: ‘lower’ for lower-sided TI, ‘upper’ for upper-sided TI and ‘both for both-sided TI.
- Returns
tolerance limits