parametric module¶

Class for the parametric estimation of statistics from a dataset.

Overview¶

The ParametricStatistics class inherits from the abstract Statistics class and aims to estimate statistics from a Dataset, based on candidate parametric distributions calibrated from this Dataset.

For each variable,

the parameters of these distributions are calibrated from the Dataset,
the fitted parametric Distribution which is optimal in the sense of a goodness-of-fit criterion and a selection criterion is selected to estimate the statistics related to this variable.

The ParametricStatistics relies on the OpenTURNS library through the OTDistribution and OTDistributionFitter classes.

Construction¶

The ParametricStatistics is built from two mandatory arguments:

a dataset,
a list of distributions names,

and can consider optional arguments:

a subset of variables names (by default, statistics are computed for all variables),
a fitting criterion name (by default, BIC is used; see AVAILABLE_CRITERIA and AVAILABLE_SIGNIFICANCE_TESTS for more information),
a level associated with the fitting criterion,
a selection criterion:
- ‘best’: select the distribution minimizing (or maximizing, depending on the criterion) the criterion,
- ‘first’: select the first distribution for which the criterion is greater (or lower, depending on the criterion) than the level,
a name for the ParametricStatistics object (by default, the name is the concatenation of ‘ParametricStatistics’ and the name of the Dataset).

Capabilities¶

By inheritance, a ParametricStatistics object has the same capabilities as Statistics. Additional ones are:

get_fitting_matrix(): this method displays the values of the fitting criterion for the different variables and candidate probability distributions as well as the select probability distribution,
plot_criteria(): this method plots the criterion values for a given variable.

class gemseo.uncertainty.statistics.parametric.ParametricStatistics(dataset, distributions, variables_names=None, fitting_criterion='BIC', level=0.05, selection_criterion='best', name=None)[source]¶

Bases: Statistics

A toolbox to compute statistics based on probability distribution-fitting.

Unless otherwise stated, the statistics are computed variable-wise and component-wise, i.e. variable-by-variable and component-by-component. So, for the sake of readability, the methods named as compute_statistic() return dict[str, ndarray] objects whose values are the names of the variables and the values are the statistic estimated for the different component.

Examples

>>> from gemseo.api import (
...     create_discipline,
...     create_parameter_space,
...     create_scenario
... )
>>> from gemseo.uncertainty.statistics.parametric import ParametricStatistics
>>>
>>> expressions = {"y1": "x1+2*x2", "y2": "x1-3*x2"}
>>> discipline = create_discipline(
...     "AnalyticDiscipline", expressions=expressions
... )
>>>
>>> parameter_space = create_parameter_space()
>>> parameter_space.add_random_variable(
...     "x1", "OTUniformDistribution", minimum=-1, maximum=1
... )
>>> parameter_space.add_random_variable(
...     "x2", "OTNormalDistribution", mu=0.5, sigma=2
... )
>>>
>>> scenario = create_scenario(
...     [discipline],
...     "DisciplinaryOpt",
...     "y1", parameter_space, scenario_type="DOE"
... )
>>> scenario.execute({'algo': 'OT_MONTE_CARLO', 'n_samples': 100})
>>>
>>> dataset = scenario.export_to_dataset(opt_naming=False)
>>>
>>> statistics = ParametricStatistics(
...     dataset, ['Normal', 'Uniform', 'Triangular']
... )
>>> fitting_matrix = statistics.get_fitting_matrix()
>>> mean = statistics.compute_mean()

Parameters:

dataset (Dataset) – A dataset.
distributions (Sequence[str]) – The names of the distributions.
variables_names (Iterable[str] | None) – The variables of interest. Default: consider all the variables available in the dataset.
fitting_criterion (str) –
The name of the goodness-of-fit criterion, measuring how the distribution fits the data. Use ParametricStatistics.get_criteria() to get the available criteria.

By default it is set to “BIC”.
level (float) –
A test level, i.e. the risk of committing a Type 1 error, that is an incorrect rejection of a true null hypothesis, for criteria based on test hypothesis.

By default it is set to 0.05.
selection_criterion (str) –
The name of the selection criterion to select a distribution from a list of candidates. Either ‘first’ or ‘best’.

By default it is set to “best”.
name (str | None) – A name for the object. Default: use the concatenation of the class and dataset names.

compute_a_value()¶

Compute the A-value \(\text{Aval}[X]\).

The A-value is the lower bound of the left-sided tolerance interval associated with a coverage level equal to 99% and a confidence level equal to 95%.

Returns:: The component-wise A-value of the different variables.
Return type:: dict[str, numpy.ndarray]

Examples using ParametricStatistics¶

Parametric estimation of statistics