The base class Statistics¶
Abstract class for the estimation of statistics from a dataset.
The abstract Statistics
class implements the concept of statistics library.
It is enriched by the EmpiricalStatistics
and ParametricStatistics
A Statistics
object is built from a Dataset
and optionally variables names.
In this case,
statistics are only computed for these variables.
statistics are computed for all the variable available in the dataset.
the user can give a name to its Statistics
By default,
this name is the concatenation of the name of the class overloading Statistics
and the name of the Dataset
A Statistics
returns standard descriptive and statistical measures
for the different variables:
: the minimum value,Statistics.compute_maximum()
: the maximum value,Statistics.compute_range()
: the difference between minimum and maximum values,Statistics.compute_mean()
: the expectation (a.k.a. mean value),Statistics.compute_moment()
: a central moment, which is a the expected value of a specified integer power of the deviation from the mean,Statistics.compute_variance()
: the variance, which is the mean squared variation around the mean value,Statistics.compute_standard_deviation()
: the standard deviation, which is the square root of the variance,Statistics.compute_quantile()
: the quantile associated with a probability, which is the cut point diving the range into a first continuous interval with this given probability and a second continuous interval with the complementary probability; common q-quantiles dividing the range into q continuous interval with equal probabilities are also implemented:Statistics.compute_median()
which implements the 2-quantile (50%).Statistics.compute_quartile()
whose order (1, 2 or 3) implements the 4-quantiles (25%, 50% and 75%),Statistics.compute_percentile()
whose order (1, 2, …, 99) implements the 100-quantiles (1%, 2%, …, 99%),
: the probability that the random variable is larger or smaller than a certain threshold,Statistics.compute_tolerance_interval()
: the left-sided, right-sided or both-sided tolerance interval associated with a given coverage level and a given confidence level, which is a statistical interval within which, with some confidence level, a specified proportion of the random variable realizations falls (this proportion is the coverage level)Statistics.compute_a_value()
: the A-value, which is the lower bound of the left-sided tolerance interval associated with a coverage level equal to 99% and a confidence level equal to 95%,Statistics.compute_b_value()
: the B-value, which is the lower bound of the left-sided tolerance interval associated with a coverage level equal to 90% and a confidence level equal to 95%,
Abstract class to interface a statistics library. |
- class gemseo.uncertainty.statistics.statistics.Statistics(dataset, variables_names=None, name=None)[source]
Abstract class to interface a statistics library.
- dataset
The dataset.
- Type
- n_samples
The number of samples.
- Type
- n_variables
The number of variables.
- Type
- name
The name of the object.
- Type
Initialize self. See help(type(self)) for accurate signature.
- Parameters
dataset (Dataset,) – A dataset.
variables_names (Optional[Iterable[str]]) –
The variables of interest. Default: consider all the variables available in the dataset.
By default it is set to None.
name (Optional[str]) –
A name for the object. Default: use the concatenation of the class and dataset names.
By default it is set to None.
- Return type
Compute the A-value.
Compute the B-value.
(variable, function[, ...])Return the expression of a statistical function applied to a variable.
Compute the maximum.
Compute the mean.
(std_factor)Compute mean + std_factor * std.
Compute the median.
Compute the minimum.
(order)Compute the n-th moment.
(order)Compute the n-th percentile.
(thresh[, greater])Compute the probability related to a threshold.
(prob)Compute the quantile related to a probability.
(order)Compute the n-th quartile.
Compute the range.
Compute the standard deviation.
(coverage[, ...])Compute a tolerance interval (TI) for a given coverage level.
Compute the variance.
- compute_a_value()[source]
Compute the A-value.
- Returns
The A-value of the different variables.
- Return type
Dict[str, numpy.ndarray]
- compute_b_value()[source]
Compute the B-value.
- Returns
The B-value of the different variables.
- Return type
Dict[str, numpy.ndarray]
- classmethod compute_expression(variable, function, show_name=False, **options)[source]
Return the expression of a statistical function applied to a variable.
- Parameters
variable (str) – The name of the variable.
function (str) – The name of the function.
show_name (bool) –
If True, show name. Otherwise, only show value.
By default it is set to False.
**options – The options passed to the statistical function.
- Returns
The expression of the statistical function applied to the variable.
- Return type
- compute_maximum()[source]
Compute the maximum.
- Returns
The maximum of the different variables.
- Return type
Dict[str, numpy.ndarray]
- compute_mean()[source]
Compute the mean.
- Returns
The mean of the different variables.
- Return type
Dict[str, numpy.ndarray]
- compute_mean_std(std_factor)[source]
Compute mean + std_factor * std.
- Returns
mean + std_factor * std for the different variables.
- Parameters
std_factor (float) –
- Return type
Dict[str, numpy.ndarray]
- compute_median()[source]
Compute the median.
- Returns
The median of the different variables.
- Return type
Dict[str, numpy.ndarray]
- compute_minimum()[source]
Compute the minimum.
- Returns
The minimum of the different variables.
- Return type
Dict[str, numpy.ndarray]
- compute_moment(order)[source]
Compute the n-th moment.
- Parameters
order (int) – The order of a moment.
- Returns
The moment of the different variables.
- Return type
Dict[str, numpy.ndarray]
- compute_percentile(order)[source]
Compute the n-th percentile.
- Parameters
order (int) – The order of the percentile. Either 0, 1, 2, … or 100.
- Returns
The percentile of the different variables.
- Return type
Dict[str, numpy.ndarray]
- compute_probability(thresh, greater=True)[source]
Compute the probability related to a threshold.
- Parameters
thresh (float) – A threshold.
greater (bool) –
The type of probability. If True, compute the probability of exceeding the threshold. Otherwise, compute the opposite.
By default it is set to True.
- Returns
The probability of the different variables
- Return type
Dict[str, numpy.ndarray]
- compute_quantile(prob)[source]
Compute the quantile related to a probability.
- Parameters
prob (float) – A probability between 0 and 1.
- Returns
The quantile of the different variables.
- Return type
Dict[str, numpy.ndarray]
- compute_quartile(order)[source]
Compute the n-th quartile.
- Parameters
order (int) – The order of the quartile. Either 1, 2 or 3.
- Returns
The quartile of the different variables.
- Return type
Dict[str, numpy.ndarray]
- compute_range()[source]
Compute the range.
- Returns
The range of the different variables.
- Return type
Dict[str, numpy.ndarray]
- compute_standard_deviation()[source]
Compute the standard deviation.
- Returns
The standard deviation of the different variables.
- Return type
Dict[str, numpy.ndarray]
- compute_tolerance_interval(coverage, confidence=0.95, side=ToleranceIntervalSide.BOTH)[source]
Compute a tolerance interval (TI) for a given coverage level.
This coverage level is the minimum percentage of belonging to the TI. The tolerance interval is computed with a confidence level and can be either lower-sided, upper-sided or both-sided.
- Parameters
coverage (float) – A minimum percentage of belonging to the TI.
confidence (float) –
A level of confidence in [0,1].
By default it is set to 0.95.
side (gemseo.uncertainty.statistics.tolerance_interval.distribution.ToleranceIntervalSide) –
The type of the tolerance interval characterized by its sides of interest, either a lower-sided tolerance interval \([a, +\infty[\), an upper-sided tolerance interval \(]-\infty, b]\), or a two-sided tolerance interval \([c, d]\).
By default it is set to BOTH.
- Returns
The tolerance limits of the different variables.
- Return type
Dict[str, Tuple[numpy.ndarray, numpy.ndarray]]
- compute_variance()[source]
Compute the variance.
- Returns
The variance of the different variables.
- Return type
Dict[str, numpy.ndarray]