empirical module¶

Class for the empirical estimation of statistics from a dataset.

Overview¶

The EmpiricalStatistics class inherits from the abstract Statistics class and aims to estimate statistics from a Dataset, based on empirical estimators.

Construction¶

A EmpiricalStatistics is built from a Dataset and optionally variables names. In this case, statistics are only computed for these variables. Otherwise, statistics are computed for all the variable available in the dataset. Lastly, the user can give a name to its EmpiricalStatistics object. By default, this name is the concatenation of ‘EmpiricalStatistics’ and the name of the Dataset.

Classes:

EmpiricalStatistics(dataset[, ...])

Empirical estimation of statistics.

class gemseo.uncertainty.statistics.empirical.EmpiricalStatistics(dataset, variables_names=None, name=None)[source]¶

Bases: gemseo.uncertainty.statistics.statistics.Statistics

Empirical estimation of statistics.

dataset¶

The dataset.

Type: Dataset

n_samples¶

The number of samples.

Type: int

n_variables¶

The number of variables.

Type: int

name¶

The name of the object.

Type: str

Examples

>>> from gemseo.api import (
...     create_discipline,
...     create_parameter_space,
...     create_scenario)
>>> from gemseo.uncertainty.statistics.empirical import EmpiricalStatistics
>>>
>>> expressions = {"y1": "x1+2*x2", "y2": "x1-3*x2"}
>>> discipline = create_discipline(
...     "AnalyticDiscipline", expressions_dict=expressions
... )
>>> discipline.set_cache_policy(discipline.MEMORY_FULL_CACHE)
>>>
>>> parameter_space = create_parameter_space()
>>> parameter_space.add_random_variable(
...     "x1", "OTUniformDistribution", minimum=-1, maximum=1
... )
>>> parameter_space.add_random_variable(
...     "x2", "OTUniformDistribution", minimum=-1, maximum=1
... )
>>>
>>> scenario = create_scenario(
...     [discipline],
...     "DisciplinaryOpt",
...     "y1",
...     parameter_space,
...     scenario_type="DOE"
... )
>>> scenario.execute({'algo': 'OT_MONTE_CARLO', 'n_samples': 100})
>>>
>>> dataset = discipline.cache.export_to_dataset()
>>>
>>> statistics = EmpiricalStatistics(dataset)
>>> mean = statistics.mean()

Initialize self. See help(type(self)) for accurate signature.

Parameters

dataset (Dataset,) – A dataset.
variables_names (Optional[Iterable[str]]) –
The variables of interest. Default: consider all the variables available in the dataset.

By default it is set to None.
name (Optional[str]) –
A name for the object. Default: use the concatenation of the class and dataset names.

By default it is set to None.

Return type

None

Attributes:

SYMBOLS

Methods:

`compute_a_value`()	Compute the A-value.
`compute_b_value`()	Compute the B-value.
`compute_expression`(variable, function[, ...])	Return the expression of a statistical function applied to a variable.
`compute_maximum`()	Compute the maximum.
`compute_mean`()	Compute the mean.
`compute_mean_std`(std_factor)	Compute mean + std_factor * std.
`compute_median`()	Compute the median.
`compute_minimum`()	Compute the minimum.
`compute_moment`(order)	Compute the n-th moment.
`compute_percentile`(order)	Compute the n-th percentile.
`compute_probability`(thresh[, greater])	Compute the probability related to a threshold.
`compute_quantile`(prob)	Compute the quantile related to a probability.
`compute_quartile`(order)	Compute the n-th quartile.
`compute_range`()	Compute the range.
`compute_standard_deviation`()	Compute the standard deviation.
`compute_tolerance_interval`(coverage[, ...])	Compute a tolerance interval (TI) for a given coverage level.
`compute_variance`()	Compute the variance.

SYMBOLS = {'a_value': 'Aval', 'b_value': 'Bval', 'maximum': 'Max', 'mean': 'E', 'mean_std': 'E_StD', 'median': 'Med', 'minimum': 'Min', 'moment': 'M', 'percentile': 'p', 'probability': 'P', 'quantile': 'Q', 'quartile': 'q', 'range': 'R', 'standard_deviation': 'StD', 'tolerance_interval': 'TI', 'variance': 'V'}¶

compute_a_value()¶

Compute the A-value.

Returns: The A-value of the different variables.
Return type: Dict[str, numpy.ndarray]

compute_b_value()¶

Compute the B-value.

Returns: The B-value of the different variables.
Return type: Dict[str, numpy.ndarray]

classmethod compute_expression(variable, function, show_name=False, **options)¶

Return the expression of a statistical function applied to a variable.

Parameters

variable (str) – The name of the variable.
function (str) – The name of the function.
show_name (bool) –
If True, show name. Otherwise, only show value.

By default it is set to False.
**options – The options passed to the statistical function.

Returns

The expression of the statistical function applied to the variable.

Return type

str

compute_maximum()[source]¶

Compute the maximum.

Returns: The maximum of the different variables.
Return type: Dict[str, numpy.ndarray]

compute_mean()[source]¶

Compute the mean.

Returns: The mean of the different variables.
Return type: Dict[str, numpy.ndarray]

compute_mean_std(std_factor)¶

Compute mean + std_factor * std.

Returns: mean + std_factor * std for the different variables.
Parameters: std_factor (float) –
Return type: Dict[str, numpy.ndarray]

compute_median()¶

Compute the median.

Returns: The median of the different variables.
Return type: Dict[str, numpy.ndarray]

compute_minimum()[source]¶

Compute the minimum.

Returns: The minimum of the different variables.
Return type: Dict[str, numpy.ndarray]

compute_moment(order)[source]¶

Compute the n-th moment.

Parameters: order (int) – The order of a moment.
Returns: The moment of the different variables.
Return type: Dict[str, numpy.ndarray]

compute_percentile(order)¶

Compute the n-th percentile.

Parameters: order (int) – The order of the percentile. Either 0, 1, 2, … or 100.
Returns: The percentile of the different variables.
Return type: Dict[str, numpy.ndarray]

compute_probability(thresh, greater=True)[source]¶

Compute the probability related to a threshold.

Parameters

thresh (float) – A threshold.
greater (bool) –
The type of probability. If True, compute the probability of exceeding the threshold. Otherwise, compute the opposite.

By default it is set to True.

Returns

The probability of the different variables

Return type

Dict[str, numpy.ndarray]

compute_quantile(prob)[source]¶

Compute the quantile related to a probability.

Parameters: prob (float) – A probability between 0 and 1.
Returns: The quantile of the different variables.
Return type: Dict[str, numpy.ndarray]

compute_quartile(order)¶

Compute the n-th quartile.

Parameters: order (int) – The order of the quartile. Either 1, 2 or 3.
Returns: The quartile of the different variables.
Return type: Dict[str, numpy.ndarray]

compute_range()[source]¶

Compute the range.

Returns: The range of the different variables.
Return type: Dict[str, numpy.ndarray]

compute_standard_deviation()[source]¶

Compute the standard deviation.

Returns: The standard deviation of the different variables.
Return type: Dict[str, numpy.ndarray]

compute_tolerance_interval(coverage, confidence=0.95, side=ToleranceIntervalSide.BOTH)¶

Compute a tolerance interval (TI) for a given coverage level.

This coverage level is the minimum percentage of belonging to the TI. The tolerance interval is computed with a confidence level and can be either lower-sided, upper-sided or both-sided.

Parameters

coverage (float) – A minimum percentage of belonging to the TI.
confidence (float) –
A level of confidence in [0,1].

By default it is set to 0.95.
side (gemseo.uncertainty.statistics.tolerance_interval.distribution.ToleranceIntervalSide) –
The type of the tolerance interval characterized by its sides of interest, either a lower-sided tolerance interval \([a, +\infty[\), an upper-sided tolerance interval \(]-\infty, b]\), or a two-sided tolerance interval \([c, d]\).

By default it is set to BOTH.

Returns

The tolerance limits of the different variables.

Return type

Dict[str, Tuple[numpy.ndarray, numpy.ndarray]]

compute_variance()[source]¶

Compute the variance.

Returns: The variance of the different variables.
Return type: Dict[str, numpy.ndarray]