gemseo.uncertainty.statistics.empirical_statistics module#
Class for the empirical estimation of statistics from a dataset.
Overview#
The EmpiricalStatistics class inherits
from the abstract BaseStatistics class
and aims to estimate statistics from a Dataset,
based on empirical estimators.
Construction#
A EmpiricalStatistics is built from a Dataset
and optionally variables names.
In this case,
statistics are only computed for these variables.
Otherwise,
statistics are computed for all the variable available in the dataset.
Lastly,
the user can give a name to its EmpiricalStatistics object.
By default,
this name is the concatenation of 'EmpiricalStatistics'
and the name of the Dataset.
- class EmpiricalStatistics(dataset, variable_names=(), name='')[source]#
Bases:
BaseStatisticsA toolbox to compute statistics empirically.
Examples
>>> from gemseo import ( ... create_discipline, ... create_parameter_space, ... create_scenario, ... ) >>> from gemseo.uncertainty.statistics.empirical_statistics import ( ... EmpiricalStatistics, ... ) >>> >>> expressions = {"y1": "x1+2*x2", "y2": "x1-3*x2"} >>> discipline = create_discipline("AnalyticDiscipline", expressions) >>> >>> parameter_space = create_parameter_space() >>> parameter_space.add_random_variable( ... "x1", "OTUniformDistribution", minimum=-1, maximum=1 ... ) >>> parameter_space.add_random_variable( ... "x2", "OTUniformDistribution", minimum=-1, maximum=1 ... ) >>> >>> scenario = create_scenario( ... [discipline], ... "y1", ... parameter_space, ... formulation_name="DisciplinaryOpt", ... scenario_type="DOE", ... ) >>> scenario.execute(algo_name="OT_MONTE_CARLO", n_samples=100) >>> >>> dataset = scenario.to_dataset(opt_naming=False) >>> >>> statistics = EmpiricalStatistics(dataset) >>> mean = statistics.compute_mean()
- Parameters:
dataset (Dataset) -- A dataset.
variable_names (Iterable[str]) --
The names of the variables for which to compute statistics. If empty, consider all the variables of the dataset.
By default it is set to ().
name (str) --
A name for the toolbox computing statistics. If empty, concatenate the names of the dataset and the name of the class.
By default it is set to "".
- compute_joint_probability(thresh, greater=True)[source]#
Compute the joint probability related to a threshold.
Either \(\mathbb{P}[X \geq x]\) or \(\mathbb{P}[X \leq x]\).
- Parameters:
- Returns:
The joint probability of the different variables (by definition of the joint probability, this statistics is not computed component-wise).
- Return type:
- compute_probability(thresh, greater=True)[source]#
Compute the probability related to a threshold.
Either \(\mathbb{P}[X \geq x]\) or \(\mathbb{P}[X \leq x]\).
- Parameters:
- Returns:
The component-wise probability of the different variables.
- Return type:
- compute_quantile(prob)[source]#
Compute the quantile \(\mathbb{Q}[X; \alpha]\) related to a probability.
- compute_tolerance_interval(coverage, confidence=0.95, side=ToleranceIntervalSide.BOTH)[source]#
Compute tolerance interval.
Given a confidence level \(1-\alpha\) and a coverage level \(\beta\), the number of samples \(n\) must verify the requirement:
\(1-(1-\beta)^n>=1-\alpha\) for a lower one-sided tolerance interval,
\(1-\beta^n>=1-\alpha\) for a upper one-sided tolerance interval,
\((n-1)\beta^n-n\beta^{n-1}+1>=1-\alpha\)
See [1] and [2] for more information about empirical tolerance intervals.
- Statistics,
John Wiley & Sons, 2009.
[2] Meeker W. Q., Hahn G. J., et Escobar L. A. Statistical intervals: a guide for practitioners and researchers, John Wiley & Sons, 2017.
- Parameters:
- Returns:
The component-wise tolerance intervals of the different variables, expressed as
{variable_name: [(lower_bound, upper_bound), ...], ... }where[(lower_bound, upper_bound), ...]are the lower and upper bounds of the tolerance interval of the different components ofvariable_name.- Raises:
ValueError -- When there are not enough samples.
- Return type:
See also
- plot_boxplot(save=False, show=True, directory_path='', file_format='png', **options)[source]#
Visualize the data with a boxplot.
- Parameters:
save (bool) --
Whether to save the figures.
By default it is set to False.
show (bool) --
Whether to show the figures.
By default it is set to True.
directory_path (str | Path) --
The path to save the figures.
By default it is set to "".
file_format (str) --
The file extension.
By default it is set to "png".
**options (Any) -- The options of the
Boxplotgraphs.
- Returns:
The boxplot of each variable.
- Return type:
- plot_cdf(save=False, show=True, directory_path='', file_format='png', **options)[source]#
Visualize the empirical cumulative probability function.
- Parameters:
save (bool) --
Whether to save the figures.
By default it is set to False.
show (bool) --
Whether to show the figures.
By default it is set to True.
directory_path (str | Path) --
The path to save the figures.
By default it is set to "".
file_format (str) --
The file extension.
By default it is set to "png".
**options (Any) -- The options of the
Linesgraphs.
- Returns:
The graph of the cumulative probability function for each variable.
- Return type:
- plot_pdf(save=False, show=True, directory_path='', file_format='png', **options)[source]#
Visualize the empirical probability density function.
- Parameters:
save (bool) --
Whether to save the figures.
By default it is set to False.
show (bool) --
Whether to show the figures.
By default it is set to True.
directory_path (str | Path) --
The path to save the figures.
By default it is set to "".
file_format (str) --
The file extension.
By default it is set to "png".
**options (Any) -- The options of the
Linesgraphs.
- Returns:
The graph of the probability density function for each variable.
- Return type: