Statistics¶
The base class Statistics¶
Abstract class for the estimation of statistics from a dataset.
Overview¶
The abstract Statistics
class implements the concept of statistics library.
It is enriched by the EmpiricalStatistics
and ParametricStatistics
.
Construction¶
A Statistics
object is built from a Dataset
and optionally variables names.
In this case,
statistics are only computed for these variables.
Otherwise,
statistics are computed for all the variable available in the dataset.
Lastly,
the user can give a name to its Statistics
object.
By default,
this name is the concatenation of the name of the class overloading Statistics
and the name of the Dataset
.
Capabilities¶
A Statistics
returns standard descriptive and statistical measures
for the different variables:
Statistics.compute_minimum()
: the minimum value,Statistics.compute_maximum()
: the maximum value,Statistics.compute_range()
: the difference between minimum and maximum values,Statistics.compute_mean()
: the expectation (a.k.a. mean value),Statistics.compute_moment()
: a central moment, which is a the expected value of a specified integer power of the deviation from the mean,Statistics.compute_variance()
: the variance, which is the mean squared variation around the mean value,Statistics.compute_standard_deviation()
: the standard deviation, which is the square root of the variance,Statistics.compute_quantile()
: the quantile associated with a probability, which is the cut point diving the range into a first continuous interval with this given probability and a second continuous interval with the complementary probability; common qquantiles dividing the range into q continuous interval with equal probabilities are also implemented:Statistics.compute_median()
which implements the 2quantile (50%).Statistics.compute_quartile()
whose order (1, 2 or 3) implements the 4quantiles (25%, 50% and 75%),Statistics.compute_percentile()
whose order (1, 2, …, 99) implements the 100quantiles (1%, 2%, …, 99%),
Statistics.compute_probability()
: the probability that the random variable is larger or smaller than a certain threshold,Statistics.compute_tolerance_interval()
: the leftsided, rightsided or bothsided tolerance interval associated with a given coverage level and a given confidence level, which is a statistical interval within which, with some confidence level, a specified proportion of the random variable realizations falls (this proportion is the coverage level)Statistics.compute_a_value()
: the Avalue, which is the lower bound of the leftsided tolerance interval associated with a coverage level equal to 99% and a confidence level equal to 95%,Statistics.compute_b_value()
: the Bvalue, which is the lower bound of the leftsided tolerance interval associated with a coverage level equal to 90% and a confidence level equal to 95%,
Classes:

Abstract class to interface a statistics library. 
 class gemseo.uncertainty.statistics.statistics.Statistics(dataset, variables_names=None, name=None)[source]
Abstract class to interface a statistics library.
 Attributes
dataset (Dataset) – The dataset.
n_samples (int) – The number of samples.
n_variables (int) – The number of variables.
name (str) – The name of the object.
 Parameters
dataset (Dataset,) –
variables_names (Optional[Iterable[str]]) –
name (Optional[str]) –
 Return type
None
Initialize self. See help(type(self)) for accurate signature.
 Parameters
dataset (Dataset,) – A dataset.
variables_names (Optional[Iterable[str]]) – The variables of interest. Default: consider all the variables available in the dataset.
name (Optional[str]) – A name for the object. Default: use the concatenation of the class and dataset names.
 Return type
None
Methods:
Compute the Avalue.
Compute the Bvalue.
compute_expression
(variable, function[, …])Return the expression of a statistical function applied to a variable.
Compute the maximum.
Compute the mean.
compute_mean_std
(std_factor)Compute mean + std_factor * std.
Compute the median.
Compute the minimum.
compute_moment
(order)Compute the nth moment.
compute_percentile
(order)Compute the nth percentile.
compute_probability
(thresh[, greater])Compute the probability related to a threshold.
compute_quantile
(prob)Compute the quantile related to a probability.
compute_quartile
(order)Compute the nth quartile.
Compute the range.
Compute the standard deviation.
compute_tolerance_interval
(coverage[, …])Compute a tolerance interval (TI) for a given coverage level.
Compute the variance.
 compute_a_value()[source]
Compute the Avalue.
 Returns
The Avalue of the different variables.
 Return type
Dict[str, numpy.ndarray]
 compute_b_value()[source]
Compute the Bvalue.
 Returns
The Bvalue of the different variables.
 Return type
Dict[str, numpy.ndarray]
 classmethod compute_expression(variable, function, show_name=False, **options)[source]
Return the expression of a statistical function applied to a variable.
 Parameters
variable (str) – The name of the variable.
function (str) – The name of the function.
show_name (bool) – If True, show name. Otherwise, only show value.
**options – The options passed to the statistical function.
 Returns
The expression of the statistical function applied to the variable.
 Return type
str
 compute_maximum()[source]
Compute the maximum.
 Returns
The maximum of the different variables.
 Return type
Dict[str, numpy.ndarray]
 compute_mean()[source]
Compute the mean.
 Returns
The mean of the different variables.
 Return type
Dict[str, numpy.ndarray]
 compute_mean_std(std_factor)[source]
Compute mean + std_factor * std.
 Returns
mean + std_factor * std for the different variables.
 Parameters
std_factor (float) –
 Return type
Dict[str, numpy.ndarray]
 compute_median()[source]
Compute the median.
 Returns
The median of the different variables.
 Return type
Dict[str, numpy.ndarray]
 compute_minimum()[source]
Compute the minimum.
 Returns
The minimum of the different variables.
 Return type
Dict[str, numpy.ndarray]
 compute_moment(order)[source]
Compute the nth moment.
 Parameters
order (int) – The order of a moment.
 Returns
The moment of the different variables.
 Return type
Dict[str, numpy.ndarray]
 compute_percentile(order)[source]
Compute the nth percentile.
 Parameters
order (int) – The order of the percentile. Either 0, 1, 2, … or 100.
 Returns
The percentile of the different variables.
 Return type
Dict[str, numpy.ndarray]
 compute_probability(thresh, greater=True)[source]
Compute the probability related to a threshold.
 Parameters
thresh (float) – A threshold.
greater (bool) – The type of probability. If True, compute the probability of exceeding the threshold. Otherwise, compute the opposite.
 Returns
The probability of the different variables
 Return type
Dict[str, numpy.ndarray]
 compute_quantile(prob)[source]
Compute the quantile related to a probability.
 Parameters
prob (float) – A probability between 0 and 1.
 Returns
The quantile of the different variables.
 Return type
Dict[str, numpy.ndarray]
 compute_quartile(order)[source]
Compute the nth quartile.
 Parameters
order (int) – The order of the quartile. Either 1, 2 or 3.
 Returns
The quartile of the different variables.
 Return type
Dict[str, numpy.ndarray]
 compute_range()[source]
Compute the range.
 Returns
The range of the different variables.
 Return type
Dict[str, numpy.ndarray]
 compute_standard_deviation()[source]
Compute the standard deviation.
 Returns
The standard deviation of the different variables.
 Return type
Dict[str, numpy.ndarray]
 compute_tolerance_interval(coverage, confidence=0.95, side=<ToleranceIntervalSide.BOTH: 3>)[source]
Compute a tolerance interval (TI) for a given coverage level.
This coverage level is the minimum percentage of belonging to the TI. The tolerance interval is computed with a confidence level and can be either lowersided, uppersided or bothsided.
 Parameters
coverage (float) – A minimum percentage of belonging to the TI.
confidence (float) – A level of confidence in [0,1].
side (gemseo.uncertainty.statistics.tolerance_interval.distribution.ToleranceIntervalSide) – The type of the tolerance interval characterized by its sides of interest, either a lowersided tolerance interval \([a, +\infty[\), an uppersided tolerance interval \(]\infty, b]\), or a twosided tolerance interval \([c, d]\).
 Returns
The tolerance limits of the different variables.
 Return type
Dict[str, Tuple[numpy.ndarray, numpy.ndarray]]
 compute_variance()[source]
Compute the variance.
 Returns
The variance of the different variables.
 Return type
Dict[str, numpy.ndarray]