parametric module¶
Parametric estimation of statistics from a dataset¶
Overview¶
The ParametricStatistics
class inherits from the
abstract Statistics
class and aims to estimate statistics
from a Dataset
, based on a collection of
candidate parametric distribution calibrated from this Dataset
.
For each variable, parameters of these distributions are calibrated
from the Dataset
and the fitted parametric Distribution
which is optimal
in the sense of a goodness-of-fit criterion and a selection criterion
is selected to estimate Statistics
associated with this variable.
The ParametricStatistics
relies on the OpenTURNS library through
the OTDistribution
and OTDistributionFitter
classes.
Construction¶
The ParametricStatistics
is built from two mandatory arguments:
a dataset,
a list of distributions names,
and can consider optional arguments:
a subset of variables names (by default, statistics are computed for all variables),
a fitting criterion name (by default, BIC is used; see
ParametricStatistics.get_available_criteria()
andParametricStatistics.get_significance_tests()
for more information),a level associated with the fitting criterion,
a selection criterion:
‘best’: select the distribution minimizing (or maximizing, depending on the criterion) the criterion,
‘first’: Select the first distribution for which the criterion is greater (or lower, depending on the criterion) than the level,
a name for the
ParametricStatistics
object (by default, the name is the concatenation of ‘ParametricStatistics’ and and the name of theDataset
).
Capabilities¶
By inheritance, a ParametricStatistics
object has the
same capabilities as Statistics
. Additional ones are:
get_fitting_matrix()
: this method shows the values of the fitting criterion for the different variables and candidate probability distributions as well as the select probability distribution,plot_criteria()
: this method plots the criterion values for a given variable.
-
class
gemseo.uncertainty.statistics.parametric.
ParametricStatistics
(dataset, distributions, variables_names=None, fitting_criterion='BIC', level=0.05, selection_criterion='best', name=None)[source]¶ Bases:
gemseo.uncertainty.statistics.statistics.Statistics
Parametric estimation of statistics.
Constructor
- Parameters
dataset (Dataset) – dataset
distributions (list(str)) – list of distributions names
variables_names (list(str)) – list of variables names or list of variables names. If None, the method considers all variables from loaded dataset. Default: None.
fitting_criterion (str) – goodness-of-fit criterion. Default: ‘BIC’.
level (float) – risk of committing a Type 1 error, that is an incorrect rejection of a true null hypothesis, for criteria based on test hypothesis. Default: 0.05.
selection_criterion (str) – selection criterion. Default: ‘best’
name (str) – name of the object. If None, use the concatenation of class and dataset names. Default: None.
-
build_distributions
(distributions)[source]¶ Build distributions from a list of distributions names, a test level and the stored dataset.
- Parameters
distributions (list(str)) – list of distributions names.
-
get_criteria
(varname)[source]¶ Get criteria for a given variable name.
- Parameters
varname (str) – variable name.
-
get_fitting_matrix
()[source]¶ Get the fitting matrix. This matrix contains goodness-of-fit measures for each pair < variable, distribution >.
-
moment
(order)[source]¶ Compute the moment for a given order, either centered or not.
- Parameters
order (int) – moment index
- Returns
moment
- Return type
float or list(float)
-
plot_criteria
(varname, title=None, save=False, show=True, n_legend_cols=4, directory='.')[source]¶ Plot criteria for a given variable name
- Parameters
varname (str) – name of the variable
title (str) – title. Default: None.
save (bool) – save the plot into a file. Default: False.
show (bool) – show the plot. Default: True.
n_legend_cols (int) – number of text columns in the upper legend. Default: 4.
directory (str) – directory absolute or relative path. Default: ‘.’.
-
probability
(thresh, greater=False)[source]¶ Compute a probability associated to a threshold.
- Parameters
thresh (float) – threshold
greater (bool) – if True, compute the probability the probability of exceeding the threshold, if False, compute the reverse. Default: True.
- Returns
probability
-
quantile
(prob)[source]¶ Get the quantile associated to a given probability.
- Parameters
prob (float) – probability
- Returns
quantile
- Return type
float or list(float)
-
standard_deviation
()[source]¶ Get the standard deviation.
- Returns
standard deviation
- Return type
float or list(float)
-
tolerance_interval
(coverage, confidence=0.95, side='both')[source]¶ Compute the tolerance interval (TI) for a given minimum percentage of the population and a given confidence level.
- Parameters
coverage (float) – minimum percentage of belonging to the TI.
confidence (float) – level of confidence in [0,1]. Default: 0.95.
side (str) – kind of interval: ‘lower’ for lower-sided TI, ‘upper’ for upper-sided TI and ‘both for both-sided TI.
- Returns
tolerance limits