parametric module¶
Class for the parametric estimation of statistics from a dataset.
Overview¶
The ParametricStatistics
class inherits
from the abstract Statistics
class
and aims to estimate statistics from a Dataset
,
based on candidate parametric distributions calibrated from this Dataset
.
For each variable,
the parameters of these distributions are calibrated from the
Dataset
,the fitted parametric
Distribution
which is optimal in the sense of a goodness-of-fit criterion and a selection criterion is selected to estimate the statistics related to this variable.
The ParametricStatistics
relies on the OpenTURNS library
through the OTDistribution
and OTDistributionFitter
classes.
Construction¶
The ParametricStatistics
is built from two mandatory arguments:
a dataset,
a list of distributions names,
and can consider optional arguments:
a subset of variables names (by default, statistics are computed for all variables),
a fitting criterion name (by default, BIC is used; see
AVAILABLE_CRITERIA
andAVAILABLE_SIGNIFICANCE_TESTS
for more information),a level associated with the fitting criterion,
a selection criterion:
‘best’: select the distribution minimizing (or maximizing, depending on the criterion) the criterion,
‘first’: select the first distribution for which the criterion is greater (or lower, depending on the criterion) than the level,
a name for the
ParametricStatistics
object (by default, the name is the concatenation of ‘ParametricStatistics’ and the name of theDataset
).
Capabilities¶
By inheritance,
a ParametricStatistics
object has
the same capabilities as Statistics
.
Additional ones are:
get_fitting_matrix()
: this method displays the values of the fitting criterion for the different variables and candidate probability distributions as well as the select probability distribution,plot_criteria()
: this method plots the criterion values for a given variable.
Classes:
|
Parametric estimation of statistics. |
- class gemseo.uncertainty.statistics.parametric.ParametricStatistics(dataset, distributions, variables_names=None, fitting_criterion='BIC', level=0.05, selection_criterion='best', name=None)[source]¶
Bases:
gemseo.uncertainty.statistics.statistics.Statistics
Parametric estimation of statistics.
- n_samples¶
The number of samples.
- Type
int
- n_variables¶
The number of variables.
- Type
int
- name¶
The name of the object.
- Type
str
- fitting_criterion¶
The name of the goodness-of-fit criterion, measuring how the distribution fits the data.
- Type
str
- level¶
The test level, i.e. risk of committing a Type 1 error, that is an incorrect rejection of a true null hypothesis, for criteria based on test hypothesis.
- Type
float
- selection_criterion¶
The name of the selection criterion to select a distribution from a list of candidates.
- Type
str
- distributions¶
The probability distributions of the random variables.
- Type
dict(str, dict(str, OTDistribution))
Examples
>>> from gemseo.api import ( ... create_discipline, ... create_parameter_space, ... create_scenario ... ) >>> from gemseo.uncertainty.statistics.parametric import ParametricStatistics >>> >>> expressions = {"y1": "x1+2*x2", "y2": "x1-3*x2"} >>> discipline = create_discipline( ... "AnalyticDiscipline", expressions_dict=expressions ... ) >>> discipline.set_cache_policy(discipline.MEMORY_FULL_CACHE) >>> >>> parameter_space = create_parameter_space() >>> parameter_space.add_random_variable( ... "x1", "OTUniformDistribution", minimum=-1, maximum=1 ... ) >>> parameter_space.add_random_variable( ... "x2", "OTNormalDistribution", mu=0.5, sigma=2 ... ) >>> >>> scenario = create_scenario( ... [discipline], ... "DisciplinaryOpt", ... "y1", parameter_space, scenario_type="DOE" ... ) >>> scenario.execute({'algo': 'OT_MONTE_CARLO', 'n_samples': 100}) >>> >>> dataset = discipline.cache.export_to_dataset() >>> >>> statistics = ParametricStatistics( ... dataset, ['Normal', 'Uniform', 'Triangular'] ... ) >>> fitting_matrix = statistics.get_fitting_matrix() >>> mean = statistics.mean()
Initialize self. See help(type(self)) for accurate signature.
- Parameters
dataset (Dataset) – A dataset.
distributions (Sequence[str]) – The names of the distributions.
variables_names (Optional[Iterable[str]]) –
The variables of interest. Default: consider all the variables available in the dataset.
By default it is set to None.
fitting_criterion (str) –
The name of the goodness-of-fit criterion, measuring how the distribution fits the data. Use
ParametricStatistics.get_criteria()
to get the available criteria.By default it is set to BIC.
level (float) –
A test level, i.e. the risk of committing a Type 1 error, that is an incorrect rejection of a true null hypothesis, for criteria based on test hypothesis.
By default it is set to 0.05.
selection_criterion (str) –
The name of the selection criterion to select a distribution from a list of candidates. Either ‘first’ or ‘best’.
By default it is set to best.
name (Optional[str]) –
A name for the object. Default: use the concatenation of the class and dataset names.
By default it is set to None.
- Return type
None
Attributes:
Methods:
Compute the A-value.
Compute the B-value.
compute_expression
(variable, function[, ...])Return the expression of a statistical function applied to a variable.
Compute the maximum.
Compute the mean.
compute_mean_std
(std_factor)Compute mean + std_factor * std.
Compute the median.
Compute the minimum.
compute_moment
(order)Compute the n-th moment.
compute_percentile
(order)Compute the n-th percentile.
compute_probability
(thresh[, greater])Compute the probability related to a threshold.
compute_quantile
(prob)Compute the quantile related to a probability.
compute_quartile
(order)Compute the n-th quartile.
Compute the range.
Compute the standard deviation.
compute_tolerance_interval
(coverage[, ...])Compute a tolerance interval (TI) for a given coverage level.
Compute the variance.
get_criteria
(variable)Get criteria for a given variable name and the different distributions.
Get the fitting matrix.
plot_criteria
(variable[, title, save, show, ...])Plot criteria for a given variable name.
- AVAILABLE_CRITERIA = ['BIC', 'ChiSquared', 'Kolmogorov']¶
- AVAILABLE_DISTRIBUTIONS = ['Arcsine', 'Beta', 'Burr', 'Chi', 'ChiSquare', 'Dirichlet', 'Exponential', 'FisherSnedecor', 'Frechet', 'Gamma', 'GeneralizedPareto', 'Gumbel', 'Histogram', 'InverseNormal', 'Laplace', 'LogNormal', 'LogUniform', 'Logistic', 'MeixnerDistribution', 'Normal', 'Pareto', 'Rayleigh', 'Rice', 'Student', 'Trapezoidal', 'Triangular', 'TruncatedNormal', 'Uniform', 'VonMises', 'WeibullMax', 'WeibullMin']¶
- AVAILABLE_SIGNIFICANCE_TESTS = ['ChiSquared', 'Kolmogorov']¶
- SYMBOLS = {'a_value': 'Aval', 'b_value': 'Bval', 'maximum': 'Max', 'mean': 'E', 'mean_std': 'E_StD', 'median': 'Med', 'minimum': 'Min', 'moment': 'M', 'percentile': 'p', 'probability': 'P', 'quantile': 'Q', 'quartile': 'q', 'range': 'R', 'standard_deviation': 'StD', 'tolerance_interval': 'TI', 'variance': 'V'}¶
- compute_a_value()¶
Compute the A-value.
- Returns
The A-value of the different variables.
- Return type
Dict[str, numpy.ndarray]
- compute_b_value()¶
Compute the B-value.
- Returns
The B-value of the different variables.
- Return type
Dict[str, numpy.ndarray]
- classmethod compute_expression(variable, function, show_name=False, **options)¶
Return the expression of a statistical function applied to a variable.
- Parameters
variable (str) – The name of the variable.
function (str) – The name of the function.
show_name (bool) –
If True, show name. Otherwise, only show value.
By default it is set to False.
**options – The options passed to the statistical function.
- Returns
The expression of the statistical function applied to the variable.
- Return type
str
- compute_maximum()[source]¶
Compute the maximum.
- Returns
The maximum of the different variables.
- Return type
Dict[str, numpy.ndarray]
- compute_mean()[source]¶
Compute the mean.
- Returns
The mean of the different variables.
- Return type
Dict[str, numpy.ndarray]
- compute_mean_std(std_factor)¶
Compute mean + std_factor * std.
- Returns
mean + std_factor * std for the different variables.
- Parameters
std_factor (float) –
- Return type
Dict[str, numpy.ndarray]
- compute_median()¶
Compute the median.
- Returns
The median of the different variables.
- Return type
Dict[str, numpy.ndarray]
- compute_minimum()[source]¶
Compute the minimum.
- Returns
The minimum of the different variables.
- Return type
Dict[str, numpy.ndarray]
- compute_moment(order)[source]¶
Compute the n-th moment.
- Parameters
order (int) – The order of a moment.
- Returns
The moment of the different variables.
- Return type
Dict[str, numpy.ndarray]
- compute_percentile(order)¶
Compute the n-th percentile.
- Parameters
order (int) – The order of the percentile. Either 0, 1, 2, … or 100.
- Returns
The percentile of the different variables.
- Return type
Dict[str, numpy.ndarray]
- compute_probability(thresh, greater=True)[source]¶
Compute the probability related to a threshold.
- Parameters
thresh (float) – A threshold.
greater (bool) –
The type of probability. If True, compute the probability of exceeding the threshold. Otherwise, compute the opposite.
By default it is set to True.
- Returns
The probability of the different variables
- Return type
Dict[str, numpy.ndarray]
- compute_quantile(prob)[source]¶
Compute the quantile related to a probability.
- Parameters
prob (float) – A probability between 0 and 1.
- Returns
The quantile of the different variables.
- Return type
Dict[str, numpy.ndarray]
- compute_quartile(order)¶
Compute the n-th quartile.
- Parameters
order (int) – The order of the quartile. Either 1, 2 or 3.
- Returns
The quartile of the different variables.
- Return type
Dict[str, numpy.ndarray]
- compute_range()[source]¶
Compute the range.
- Returns
The range of the different variables.
- Return type
Dict[str, numpy.ndarray]
- compute_standard_deviation()[source]¶
Compute the standard deviation.
- Returns
The standard deviation of the different variables.
- Return type
Dict[str, numpy.ndarray]
- compute_tolerance_interval(coverage, confidence=0.95, side=ToleranceIntervalSide.BOTH)[source]¶
Compute a tolerance interval (TI) for a given coverage level.
This coverage level is the minimum percentage of belonging to the TI. The tolerance interval is computed with a confidence level and can be either lower-sided, upper-sided or both-sided.
- Parameters
coverage (float) – A minimum percentage of belonging to the TI.
confidence (float) –
A level of confidence in [0,1].
By default it is set to 0.95.
side (gemseo.uncertainty.statistics.tolerance_interval.distribution.ToleranceIntervalSide) –
The type of the tolerance interval characterized by its sides of interest, either a lower-sided tolerance interval \([a, +\infty[\), an upper-sided tolerance interval \(]-\infty, b]\), or a two-sided tolerance interval \([c, d]\).
By default it is set to BOTH.
- Returns
The tolerance limits of the different variables.
- Return type
Dict[str, Tuple[numpy.ndarray, numpy.ndarray]]
- compute_variance()[source]¶
Compute the variance.
- Returns
The variance of the different variables.
- Return type
Dict[str, numpy.ndarray]
- get_criteria(variable)[source]¶
Get criteria for a given variable name and the different distributions.
- Parameters
variable (str) – The name of the variable.
- Returns
The criterion for the different distributions. and an indicator equal to True is the criterion is a p-value.
- Return type
Tuple[Dict[str, float], bool]
- get_fitting_matrix()[source]¶
Get the fitting matrix.
This matrix contains goodness-of-fit measures for each pair < variable, distribution >.
- Returns
The printable fitting matrix.
- Return type
str
- plot_criteria(variable, title=None, save=False, show=True, n_legend_cols=4, directory='.')[source]¶
Plot criteria for a given variable name.
- Parameters
variable (str) – The name of the variable.
title (Optional[str]) –
A plot title.
By default it is set to None.
save (bool) –
If True, save the plot on the disk.
By default it is set to False.
show (bool) –
If True, show the plot.
By default it is set to True.
n_legend_cols (int) –
The number of text columns in the upper legend.
By default it is set to 4.
directory (str) –
The directory path, either absolute or relative.
By default it is set to ..
- Raises
ValueError – If the variable is missing from the dataset.
- Return type
None