empirical module¶
Class for the empirical estimation of statistics from a dataset.
Overview¶
The EmpiricalStatistics
class inherits
from the abstract Statistics
class
and aims to estimate statistics from a Dataset
,
based on empirical estimators.
Construction¶
A EmpiricalStatistics
is built from a Dataset
and optionally variables names.
In this case,
statistics are only computed for these variables.
Otherwise,
statistics are computed for all the variable available in the dataset.
Lastly,
the user can give a name to its EmpiricalStatistics
object.
By default,
this name is the concatenation of ‘EmpiricalStatistics’
and the name of the Dataset
.
- class gemseo.uncertainty.statistics.empirical.EmpiricalStatistics(dataset, variables_names=None, name=None)[source]¶
Bases:
Statistics
Empirical estimation of statistics.
Examples
>>> from gemseo.api import ( ... create_discipline, ... create_parameter_space, ... create_scenario) >>> from gemseo.uncertainty.statistics.empirical import EmpiricalStatistics >>> >>> expressions = {"y1": "x1+2*x2", "y2": "x1-3*x2"} >>> discipline = create_discipline( ... "AnalyticDiscipline", expressions=expressions ... ) >>> >>> parameter_space = create_parameter_space() >>> parameter_space.add_random_variable( ... "x1", "OTUniformDistribution", minimum=-1, maximum=1 ... ) >>> parameter_space.add_random_variable( ... "x2", "OTUniformDistribution", minimum=-1, maximum=1 ... ) >>> >>> scenario = create_scenario( ... [discipline], ... "DisciplinaryOpt", ... "y1", ... parameter_space, ... scenario_type="DOE" ... ) >>> scenario.execute({'algo': 'OT_MONTE_CARLO', 'n_samples': 100}) >>> >>> dataset = scenario.export_to_dataset(opt_naming=False) >>> >>> statistics = EmpiricalStatistics(dataset) >>> mean = statistics.mean()
- Parameters:
- compute_a_value()¶
Compute the A-value \(\text{Aval}[X]\).
- Returns:
The A-value of the different variables.
- Return type:
- compute_b_value()¶
Compute the B-value \(\text{Bval}[X]\).
- Returns:
The B-value of the different variables.
- Return type:
- classmethod compute_expression(variable_name, statistic_name, show_name=False, **options)¶
Return the expression of a statistical function applied to a variable.
E.g. “P[X >= 1.0]” for the probability that X exceeds 1.0.
- Parameters:
variable_name (str) – The name of the variable, e.g.
"X"
.statistic_name (str) – The name of the statistic, e.g.
"probability"
.show_name (bool) –
If True, show option names. Otherwise, only show option values.
By default it is set to False.
**options (bool | float | int) – The options passed to the statistical function, e.g.
{"greater": True, "thresh": 1.0}
.
- Returns:
The expression of the statistical function applied to the variable.
- Return type:
- compute_margin(std_factor)¶
Compute a margin \(\text{Margin}[X]=\mathbb{E}[X]+\kappa\mathbb{S}[X]\).
- Parameters:
std_factor (float) – The weight \(\kappa\) of the standard deviation.
- Returns:
The margin for the different variables.
- Return type:
- compute_maximum()[source]¶
Compute the maximum \(\text{Max}[X]\).
- Returns:
The maximum of the different variables.
- Return type:
- compute_mean()[source]¶
Compute the mean \(\mathbb{E}[X]\).
- Returns:
The mean of the different variables.
- Return type:
- compute_mean_std(std_factor)¶
Compute a margin \(\text{Margin}[X]=\mathbb{E}[X]+\kappa\mathbb{S}[X]\).
- Parameters:
std_factor (float) – The weight \(\kappa\) of the standard deviation.
- Returns:
The margin for the different variables.
- Return type:
- compute_median()¶
Compute the median \(\text{Med}[X]\).
- Returns:
The median of the different variables.
- Return type:
- compute_minimum()[source]¶
Compute the \(\text{Min}[X]\).
- Returns:
The minimum of the different variables.
- Return type:
- compute_moment(order)[source]¶
Compute the n-th moment \(M[X; n]\).
- Parameters:
order (int) – The order \(n\) of the moment.
- Returns:
The moment of the different variables.
- Return type:
- compute_percentile(order)¶
Compute the n-th percentile \(\text{p}[X; n]\).
- Parameters:
order (int) – The order \(n\) of the percentile. Either 0, 1, 2, … or 100.
- Returns:
The percentile of the different variables.
- Return type:
- compute_probability(thresh, greater=True)[source]¶
Compute the probability related to a threshold.
Either \(\mathbb{P}[X \geq x]\) or \(\mathbb{P}[X \leq x]\).
- Parameters:
- Returns:
The probability of the different variables
- Return type:
- compute_quantile(prob)[source]¶
Compute the quantile \(\mathbb{Q}[X; \alpha]\) related to a probability.
- Parameters:
prob (float) – A probability \(\alpha\) between 0 and 1.
- Returns:
The quantile of the different variables.
- Return type:
- compute_quartile(order)¶
Compute the n-th quartile \(q[X; n]\).
- Parameters:
order (int) – The order \(n\) of the quartile. Either 1, 2 or 3.
- Returns:
The quartile of the different variables.
- Return type:
- compute_range()[source]¶
Compute the range \(R[X]\).
- Returns:
The range of the different variables.
- Return type:
- compute_standard_deviation()[source]¶
Compute the standard deviation \(\mathbb{S}[X]\).
- Returns:
The standard deviation of the different variables.
- Return type:
- compute_tolerance_interval(coverage, confidence=0.95, side=ToleranceIntervalSide.BOTH)¶
Compute a tolerance interval \(\text{TI}[X]\).
This coverage level is the minimum percentage of belonging to the TI. The tolerance interval is computed with a confidence level and can be either lower-sided, upper-sided or both-sided.
- Parameters:
coverage (float) – A minimum percentage of belonging to the TI.
confidence (float) –
A level of confidence in [0,1].
By default it is set to 0.95.
side (ToleranceIntervalSide) –
The type of the tolerance interval characterized by its sides of interest, either a lower-sided tolerance interval \([a, +\infty[\), an upper-sided tolerance interval \(]-\infty, b]\), or a two-sided tolerance interval \([c, d]\).
By default it is set to BOTH.
- Returns:
The tolerance limits of the different variables.
- Return type:
- compute_variance()[source]¶
Compute the variance \(\mathbb{V}[X]\).
- Returns:
The variance of the different variables.
- Return type:
- compute_variation_coefficient()¶
Compute the coefficient of variation \(CoV[X]\).
This is the standard deviation normalized by the expectation: \(CoV[X]=\mathbb{E}[S]/\mathbb{E}[X]\).
- Returns:
The coefficient of variation of the different variables.
- Return type:
- SYMBOLS = {'a_value': 'Aval', 'b_value': 'Bval', 'margin': 'Margin', 'maximum': 'Max', 'mean': 'E', 'mean_std': 'E_StD', 'median': 'Med', 'minimum': 'Min', 'moment': 'M', 'percentile': 'p', 'probability': 'P', 'quantile': 'Q', 'quartile': 'q', 'range': 'R', 'standard_deviation': 'StD', 'tolerance_interval': 'TI', 'variance': 'V', 'variation_coefficient': 'CoV'}¶