gemseo / uncertainty / statistics

parametric module

Parametric estimation of statistics from a dataset

Overview

The ParametricStatistics class inherits from the abstract Statistics class and aims to estimate statistics from a Dataset, based on a collection of candidate parametric distribution calibrated from this Dataset. For each variable, parameters of these distributions are calibrated from the Dataset and the fitted parametric Distribution which is optimal in the sense of a goodness-of-fit criterion and a selection criterion is selected to estimate Statistics associated with this variable. The ParametricStatistics relies on the OpenTURNS library through the OTDistribution and OTDistributionFitter classes.

Construction

The ParametricStatistics is built from two mandatory arguments:

  • a dataset,

  • a list of distributions names,

and can consider optional arguments:

  • a subset of variables names (by default, statistics are computed for all variables),

  • a fitting criterion name (by default, BIC is used; see ParametricStatistics.get_available_criteria() and ParametricStatistics.get_significance_tests() for more information),

  • a level associated with the fitting criterion,

  • a selection criterion:

    • ‘best’: select the distribution minimizing (or maximizing, depending on the criterion) the criterion,

    • ‘first’: Select the first distribution for which the criterion is greater (or lower, depending on the criterion) than the level,

  • a name for the ParametricStatistics object (by default, the name is the concatenation of ‘ParametricStatistics’ and and the name of the Dataset).

Capabilities

By inheritance, a ParametricStatistics object has the same capabilities as Statistics. Additional ones are:

  • get_fitting_matrix(): this method shows the values of the fitting criterion for the different variables and candidate probability distributions as well as the select probability distribution,

  • plot_criteria(): this method plots the criterion values for a given variable.

class gemseo.uncertainty.statistics.parametric.ParametricStatistics(dataset, distributions, variables_names=None, fitting_criterion='BIC', level=0.05, selection_criterion='best', name=None)[source]

Bases: gemseo.uncertainty.statistics.statistics.Statistics

Parametric estimation of statistics.

Constructor

Parameters
  • dataset (Dataset) – dataset

  • distributions (list(str)) – list of distributions names

  • variables_names (list(str)) – list of variables names or list of variables names. If None, the method considers all variables from loaded dataset. Default: None.

  • fitting_criterion (str) – goodness-of-fit criterion. Default: ‘BIC’.

  • level (float) – risk of committing a Type 1 error, that is an incorrect rejection of a true null hypothesis, for criteria based on test hypothesis. Default: 0.05.

  • selection_criterion (str) – selection criterion. Default: ‘best’

  • name (str) – name of the object. If None, use the concatenation of class and dataset names. Default: None.

build_distributions(distributions)[source]

Build distributions from a list of distributions names, a test level and the stored dataset.

Parameters

distributions (list(str)) – list of distributions names.

classmethod get_available_criteria()[source]

Get available goodness-of-fit criteria.

classmethod get_available_distributions()[source]

Get available distributions.

get_criteria(varname)[source]

Get criteria for a given variable name.

Parameters

varname (str) – variable name.

get_fitting_matrix()[source]

Get the fitting matrix. This matrix contains goodness-of-fit measures for each pair < variable, distribution >.

classmethod get_significance_tests()[source]

Get significance tests.

maximum()[source]

Get the maximum.

Returns

maximum

mean()[source]

Get the mean.

Returns

mean

minimum()[source]

Get the minimum.

Returns

minimum

moment(order)[source]

Compute the moment for a given order, either centered or not.

Parameters

order (int) – moment index

Returns

moment

Return type

float or list(float)

plot_criteria(varname, title=None, save=False, show=True, n_legend_cols=4, directory='.')[source]

Plot criteria for a given variable name

Parameters
  • varname (str) – name of the variable

  • title (str) – title. Default: None.

  • save (bool) – save the plot into a file. Default: False.

  • show (bool) – show the plot. Default: True.

  • n_legend_cols (int) – number of text columns in the upper legend. Default: 4.

  • directory (str) – directory absolute or relative path. Default: ‘.’.

probability(thresh, greater=False)[source]

Compute a probability associated to a threshold.

Parameters
  • thresh (float) – threshold

  • greater (bool) – if True, compute the probability the probability of exceeding the threshold, if False, compute the reverse. Default: True.

Returns

probability

quantile(prob)[source]

Get the quantile associated to a given probability.

Parameters

prob (float) – probability

Returns

quantile

Return type

float or list(float)

range()[source]

Get the range of variables.

Returns

range of variables

standard_deviation()[source]

Get the standard deviation.

Returns

standard deviation

Return type

float or list(float)

tolerance_interval(coverage, confidence=0.95, side='both')[source]

Compute the tolerance interval (TI) for a given minimum percentage of the population and a given confidence level.

Parameters
  • coverage (float) – minimum percentage of belonging to the TI.

  • confidence (float) – level of confidence in [0,1]. Default: 0.95.

  • side (str) – kind of interval: ‘lower’ for lower-sided TI, ‘upper’ for upper-sided TI and ‘both for both-sided TI.

Returns

tolerance limits

variance()[source]

Get the variance.

Returns

variance

Return type

float or list(float)