gemseo.uncertainty.statistics.base_parametric_statistics module#

Parametric estimation of statistics from a dataset.

The base class BaseParametricStatistics aims to estimate statistics parametrically, using probability distributions fitted to a Dataset at instantiation

For each variable of this Dataset,

the parameters of the distributions are calibrated from this Dataset,
the fitted parametric distribution which is optimal in the sense of a goodness-of-fit criterion and a selection criterion is selected to estimate the statistics associated with this variable.

Its subclass OTParametricStatistics uses the OpenTURNS distributions through the OTDistribution and OTDistributionFitter classes and its subclass SPParametricStatistics uses the SciPy distributions through the SPDistribution and SPDistributionFitter classes.

class BaseParametricStatistics(dataset, distributions, variable_names=(), fitting_criterion=None, level=0.05, selection_criterion=SelectionCriterion.BEST, name='')[source]#

Bases: BaseStatistics, Generic[_DistributionT, _DefaultFittingCriterionT, _DistributionNameT, _FittingCriterionT, _SignificanceTestT]

Base class to compute statistics using probability distribution-fitting.

Parameters:

dataset (Dataset) -- A dataset.
distributions (Sequence[_DistributionNameT]) -- The names of the probability distributions.
variable_names (Iterable[str]) --
The names of the variables for which to compute statistics. If empty, consider all the variables of the dataset.

By default it is set to ().
fitting_criterion (_FittingCriterionT | None) -- The name of the fitting criterion to measure the goodness-of-fit of the probability distributions. If empty, use the default one. Use get_criteria() to get the available criteria.
level (float) --
A test level, i.e. the risk of committing a Type 1 error, that is an incorrect rejection of a true null hypothesis, for criteria based on test hypothesis.

By default it is set to 0.05.
selection_criterion (SelectionCriterion) --
The name of the criterion to select a distribution among distributions.

By default it is set to "best".
name (str) --
A name for the toolbox computing statistics. If empty, concatenate the names of the dataset and the name of the class.

By default it is set to "".

class SelectionCriterion(*values)#

Bases: StrEnum

The selection criteria.

BEST = 'best'#: Select the distribution that best satisfies a fitting criterion

FIRST = 'first'#: Select the first distribution satisfying a fitting criterion.

compute_joint_probability(thresh, greater=True)[source]#

Compute the joint probability related to a threshold.

Either \(\mathbb{P}[X \geq x]\) or \(\mathbb{P}[X \leq x]\).

Parameters:

thresh (Mapping[str, float | RealArray]) -- A threshold \(x\) per variable.
greater (bool) --
The type of probability. If True, compute the probability of exceeding the threshold. Otherwise, compute the opposite.

By default it is set to True.

Returns:

The joint probability of the different variables (by definition of the joint probability, this statistics is not computed component-wise).

Return type:

dict[str, float]

compute_maximum()[source]#

Compute the maximum \(\text{Max}[X]\).

Returns:: The component-wise maximum of the different variables.
Return type:: dict[str, RealArray]

compute_mean()[source]#

Compute the mean \(\mathbb{E}[X]\).

Returns:: The component-wise mean of the different variables.
Return type:: dict[str, RealArray]

compute_minimum()[source]#

Compute the \(\text{Min}[X]\).

Returns:: The component-wise minimum of the different variables.
Return type:: dict[str, RealArray]

compute_moment(order)[source]#

Compute the n-th moment \(M[X; n]\).

Parameters:: order (int) -- The order \(n\) of the moment.
Returns:: The component-wise moment of the different variables.
Return type:: dict[str, RealArray]

compute_probability(thresh, greater=True)[source]#

Compute the probability related to a threshold.

Either \(\mathbb{P}[X \geq x]\) or \(\mathbb{P}[X \leq x]\).

Parameters:

thresh (Mapping[str, float | RealArray]) -- A threshold \(x\) per variable.
greater (bool) --
The type of probability. If True, compute the probability of exceeding the threshold. Otherwise, compute the opposite.

By default it is set to True.

Returns:

The component-wise probability of the different variables.

Return type:

dict[str, RealArray]

compute_quantile(prob)[source]#

Compute the quantile \(\mathbb{Q}[X; \alpha]\) related to a probability.

Parameters:: prob (float) -- A probability \(\alpha\) between 0 and 1.
Returns:: The component-wise quantile of the different variables.
Return type:: dict[str, RealArray]

compute_range()[source]#

Compute the range \(R[X]\).

Returns:: The component-wise range of the different variables.
Return type:: dict[str, RealArray]

compute_standard_deviation()[source]#

Compute the standard deviation \(\mathbb{S}[X]\).

Returns:: The component-wise standard deviation of the different variables.
Return type:: dict[str, RealArray]

compute_tolerance_interval(coverage, confidence=0.95, side=ToleranceIntervalSide.BOTH)[source]#

Compute a \((p,1-\alpha)\) tolerance interval \(\text{TI}[X]\).

The tolerance interval \(\text{TI}[X]\) is defined to contain at least a proportion \(p\) of the values of \(X\) with a level of confidence \(1-\alpha\). \(p\) is also called the coverage level of the TI.

Typically, \(\alpha=0.05\) or equivalently \(1-\alpha=0.95\).

The tolerance interval can be either

lower-sided (side="LOWER": \([L, +\infty[\)),
upper-sided (side="UPPER": \(]-\infty, U]\)) or
both-sided (side="BOTH": \([L, U]\)).

Parameters:

coverage (float) -- A minimum proportion \(p\in[0,1]\) of belonging to the TI.
confidence (float) --
A level of confidence \(1-\alpha\in[0,1]\).

By default it is set to 0.95.
side (ToleranceIntervalSide) --
The type of the tolerance interval.

By default it is set to "both".

Returns:

The component-wise tolerance intervals of the different variables, expressed as {variable_name: [(lower_bound, upper_bound), ...], ... } where [(lower_bound, upper_bound), ...] are the lower and upper bounds of the tolerance interval of the different components of variable_name.

Return type:

dict[str, list[Bounds]]

gemseo.uncertainty.statistics.base_parametric_statistics module#

This Page