gemseo.uncertainty.statistics.base_parametric_statistics module#
Parametric estimation of statistics from a dataset.
The base class BaseParametricStatistics aims
to estimate statistics parametrically,
using probability distributions fitted to a Dataset at instantiation
For each variable of this Dataset,
the parameters of the distributions are calibrated from this
Dataset,the fitted parametric distribution which is optimal in the sense of a goodness-of-fit criterion and a selection criterion is selected to estimate the statistics associated with this variable.
Its subclass OTParametricStatistics uses the OpenTURNS distributions
through the OTDistribution and OTDistributionFitter classes
and
its subclass SPParametricStatistics uses the SciPy distributions
through the SPDistribution and SPDistributionFitter classes.
- class BaseParametricStatistics(dataset, distributions, variable_names=(), fitting_criterion=None, level=0.05, selection_criterion=SelectionCriterion.BEST, name='')[source]#
Bases:
BaseStatistics,Generic[_DistributionT,_DefaultFittingCriterionT,_DistributionNameT,_FittingCriterionT,_SignificanceTestT]Base class to compute statistics using probability distribution-fitting.
- Parameters:
dataset (Dataset) -- A dataset.
distributions (Sequence[_DistributionNameT]) -- The names of the probability distributions.
variable_names (Iterable[str]) --
The names of the variables for which to compute statistics. If empty, consider all the variables of the dataset.
By default it is set to ().
fitting_criterion (_FittingCriterionT | None) -- The name of the fitting criterion to measure the goodness-of-fit of the probability distributions. If empty, use the default one. Use
get_criteria()to get the available criteria.level (float) --
A test level, i.e. the risk of committing a Type 1 error, that is an incorrect rejection of a true null hypothesis, for criteria based on test hypothesis.
By default it is set to 0.05.
selection_criterion (SelectionCriterion) --
The name of the criterion to select a distribution among
distributions.By default it is set to "best".
name (str) --
A name for the toolbox computing statistics. If empty, concatenate the names of the dataset and the name of the class.
By default it is set to "".
- class SelectionCriterion(*values)#
Bases:
StrEnumThe selection criteria.
- BEST = 'best'#
Select the distribution that best satisfies a fitting criterion
- FIRST = 'first'#
Select the first distribution satisfying a fitting criterion.
- compute_joint_probability(thresh, greater=True)[source]#
Compute the joint probability related to a threshold.
Either \(\mathbb{P}[X \geq x]\) or \(\mathbb{P}[X \leq x]\).
- Parameters:
- Returns:
The joint probability of the different variables (by definition of the joint probability, this statistics is not computed component-wise).
- Return type:
- compute_probability(thresh, greater=True)[source]#
Compute the probability related to a threshold.
Either \(\mathbb{P}[X \geq x]\) or \(\mathbb{P}[X \leq x]\).
- Parameters:
- Returns:
The component-wise probability of the different variables.
- Return type:
- compute_quantile(prob)[source]#
Compute the quantile \(\mathbb{Q}[X; \alpha]\) related to a probability.
- compute_tolerance_interval(coverage, confidence=0.95, side=ToleranceIntervalSide.BOTH)[source]#
Compute a \((p,1-\alpha)\) tolerance interval \(\text{TI}[X]\).
The tolerance interval \(\text{TI}[X]\) is defined to contain at least a proportion \(p\) of the values of \(X\) with a level of confidence \(1-\alpha\). \(p\) is also called the coverage level of the TI.
Typically, \(\alpha=0.05\) or equivalently \(1-\alpha=0.95\).
The tolerance interval can be either
lower-sided (
side="LOWER": \([L, +\infty[\)),upper-sided (
side="UPPER": \(]-\infty, U]\)) orboth-sided (
side="BOTH": \([L, U]\)).
- Parameters:
- Returns:
The component-wise tolerance intervals of the different variables, expressed as
{variable_name: [(lower_bound, upper_bound), ...], ... }where[(lower_bound, upper_bound), ...]are the lower and upper bounds of the tolerance interval of the different components ofvariable_name.- Return type:
See also
- get_criteria(variable, index=0)[source]#
Get the value of the fitting criterion for the different distributions.
- Parameters:
- Returns:
The value of the fitting criterion for the given variable name and component and the different distributions, as well as whether this fitting criterion is a statistical test and so this value a p-value.
- Return type:
- get_fitting_matrix()[source]#
Get the fitting matrix.
This matrix contains goodness-of-fit measures for each pair < variable, distribution >.
- Returns:
The printable fitting matrix.
- Return type:
- plot(save=False, show=True, directory_path='', file_format='png')[source]#
Visualize the cumulative distribution and probability density functions.
- Parameters:
save (bool) --
Whether to save the figures.
By default it is set to False.
show (bool) --
Whether to show the figures.
By default it is set to True.
directory_path (str | Path) --
The path to save the figures.
By default it is set to "".
file_format (str) --
The file extension.
By default it is set to "png".
- Returns:
The cumulative distribution and probability density functions for each variable.
- Return type:
- plot_criteria(variable, title='', save=False, show=True, directory='.', index=0, fig_size=(6.4, 3.2))[source]#
Plot criteria for a given variable name.
- Parameters:
variable (str) -- The name of the variable.
title (str) --
The title of the plot, if any.
By default it is set to "".
save (bool) --
If
True, save the plot on the disk.By default it is set to False.
show (bool) --
If
True, show the plot.By default it is set to True.
directory (str | Path) --
The directory path, either absolute or relative.
By default it is set to ".".
index (int) --
The index of the component of the variable.
By default it is set to 0.
fig_size (FigSizeType) --
The width and height of the figure in inches, e.g.
(w, h).By default it is set to (6.4, 3.2).
- Raises:
ValueError -- If the variable is missing from the dataset.
- Return type:
Figure
- DistributionName: ClassVar[StrEnum] = ~_DistributionNameT#
- FittingCriterion: ClassVar[StrEnum] = ~_FittingCriterionT#
- SignificanceTest: ClassVar[StrEnum] = ~_SignificanceTestT#
- distributions: dict[str, _DistributionType | list[_DistributionType]]#
The probability distributions of the random variables.
When a random variable is a random vector, its probability distribution is expressed as a list of marginal distributions. Otherwise, its probability distribution is expressed as the unique marginal distribution.
- fitting_criterion: _FittingCriterionT#
The goodness-of-fit criterion, measuring how the distribution fits the data.
- level: float#
The test level used by the selection criteria that are significance tests.
In statistical hypothesis testing, the test level corresponds to the risk of committing a type 1 error, that is an incorrect rejection of the null hypothesis
- selection_criterion: SelectionCriterion#
The selection criterion to select a distribution from a list of candidates.