Probability distribution¶

The package distributions¶

Capabilities to create and manipulate probability distributions.

This package contains:

an abstract class BaseDistribution to define the concept of probability distribution,
an abstract class BaseJointDistribution to define the concept of joint probability distribution by composing several instances of BaseDistribution,
a factory DistributionFactory to create instances of BaseDistribution,
concrete classes implementing these abstracts concepts, by interfacing:
- the OpenTURNS library: OTDistribution and OTJointDistribution,
- the Scipy library: SPDistribution and SPJointDistribution.

Lastly, the class OTDistributionFitter offers the possibility to fit an OTDistribution from data based on OpenTURNS.

The base class BaseDistribution¶

Abstract class defining the concept of probability distribution.

Overview¶

The abstract BaseDistribution class implements the concept of probability distribution, which is a mathematical function giving the probabilities of occurrence of different possible outcomes of a random variable for an experiment. The normal distribution with its famous bell curve is a well-known example of probability distribution.

Construction¶

The BaseDistribution of a given uncertain variable is built from a recognized distribution name (e.g. ‘Normal’ for OpenTURNS or ‘norm’ for SciPy), a variable dimension, a set of parameters and optionally a standard representation of these parameters.

Capabilities¶

From a BaseDistribution, we can easily get statistics, such as BaseDistribution.mean, BaseDistribution.standard_deviation. We can also get the numerical BaseDistribution.range and mathematical BaseDistribution.support.

Note

We call mathematical support the set of values that the random variable can take in theory, e.g. \(]-\infty,+\infty[\) for a Gaussian variable, and numerical range the set of values that it can take in practice, taking into account the values rounded to zero double precision. Both support and range are described in terms of lower and upper bounds

We can also evaluate the cumulative density function (BaseDistribution.compute_cdf()) for the different marginals of the random variable, as well as the inverse cumulative density function (BaseDistribution.compute_inverse_cdf()). We can plot them, either for a given marginal (BaseDistribution.plot()) or for all marginals (BaseDistribution.plot_all()).

Lastly, we can compute realizations of the random variable by means of the BaseDistribution.compute_samples() method.

class gemseo.uncertainty.distributions.base_distribution.BaseDistribution(variable, interfaced_distribution, parameters, dimension=1, standard_parameters=None)[source]

Probability distribution related to a random variable.

Parameters:

variable (str) – The name of the random variable.
interfaced_distribution (str) – The name of the probability distribution, typically the name of a class wrapped from an external library, such as "Normal" for OpenTURNS or "norm" for SciPy.
parameters (ParametersType) – The parameters of the probability distribution.
dimension (int) –
The dimension of the random variable. If greater than 1, the probability distribution is applied to all components of the random variable under the hypothesis that these components are stochastically independent. To be removed in a future version; use a BaseJointDistribution instead.

By default it is set to 1.
standard_parameters (StandardParametersType | None) – The parameters of the probability distribution used for string representation only (use parameters for computation). If None, use parameters instead. For instance, let us consider an interfaced distribution named "Dirac" with positional parameters (this is the case of OTDistribution). Then, the string representation of BaseDistribution("x", "Dirac", (1,), 1, {"loc": 1}) is "Dirac(loc=1)" while the string representation of BaseDistribution("x", "Dirac", (1,)) is "Dirac(1)". The same mechanism works for keyword parameters (this is the case of SPDistribution).

abstract compute_cdf(vector)[source]

Evaluate the cumulative density function (CDF).

Evaluate the CDF of the components of the random variable for a given realization of this random variable.

Parameters:: vector (Iterable[float]) – A realization of the random variable.
Returns:: The CDF values of the components of the random variable.
Return type:: ndarray

abstract compute_inverse_cdf(vector)[source]

Evaluate the inverse of the cumulative density function (ICDF).

Parameters:: vector (Iterable[float]) – A vector of values comprised between 0 and 1 whose length is equal to the dimension of the random variable.
Returns:: The ICDF values of the components of the random variable.
Return type:: ndarray

abstract compute_samples(n_samples=1)[source]

Sample the random variable.

Parameters:

n_samples (int) –

The number of samples.

By default it is set to 1.

Returns:

The samples of the random variable,

The number of columns is equal to the dimension of the variable and the number of lines is equal to the number of samples.

Return type:

ndarray

plot(index=0, show=True, save=False, file_path='', directory_path='', file_name='', file_extension='')[source]

Plot both probability and cumulative density functions for a given component.

Parameters:

index (int) –
The index of a component of the random variable.

By default it is set to 0.
save (bool) –
If True, save the figure.

By default it is set to False.
show (bool) –
If True, display the figure.

By default it is set to True.
file_path (str | Path) –
The path of the file to save the figures. If the extension is missing, use file_extension. If empty, create a file path from directory_path, file_name and file_extension.

By default it is set to “”.
directory_path (str | Path) –
The path of the directory to save the figures. If empty, use the current working directory.

By default it is set to “”.
file_name (str) –
The name of the file to save the figures. If empty, use a default one generated by the post-processing.

By default it is set to “”.
file_extension (str) –
A file extension, e.g. 'png', 'pdf', 'svg', … If empty, use a default file extension.

By default it is set to “”.

Returns:

The figure.

Return type:

Figure

plot_all(show=True, save=False, file_path='', directory_path='', file_name='', file_extension='')[source]

Plot both probability and cumulative density functions for all components.

Parameters:

save (bool) –
If True, save the figure.

By default it is set to False.
show (bool) –
If True, display the figure.

By default it is set to True.
file_path (str | Path) –
The path of the file to save the figures. If the extension is missing, use file_extension. If empty, create a file path from directory_path, file_name and file_extension.

By default it is set to “”.
directory_path (str | Path) –
The path of the directory to save the figures. If empty, use the current working directory.

By default it is set to “”.
file_name (str) –
The name of the file to save the figures. If empty, use a default one generated by the post-processing.

By default it is set to “”.
file_extension (str) –
A file extension, e.g. 'png', 'pdf', 'svg', … If empty, use a default file extension.

By default it is set to “”.

Returns:

The figures.

Return type:

list[Figure]

DEFAULT_VARIABLE_NAME: Final[str] = 'x': The default name of the variable.

JOINT_DISTRIBUTION_CLASS: ClassVar[type[BaseJointDistribution] | None] = None: The class of the joint distribution associated with this distribution, if any.

dimension: int: The number of dimensions of the random variable.

distribution: type: The probability distribution of the random variable.

distribution_name: str: The name of the probability distribution.

marginals: list[type]: The marginal distributions of the components of the random variable.

math_lower_bound: ndarray: The mathematical lower bound of the random variable.

math_upper_bound: ndarray: The mathematical upper bound of the random variable.

abstract property mean: ndarray: The analytical mean of the random variable.

num_lower_bound: ndarray: The numerical lower bound of the random variable.

num_upper_bound: ndarray: The numerical upper bound of the random variable.

parameters: tuple[Any] | dict[str, Any]: The parameters of the probability distribution.

property range: list[ndarray]

The numerical range.

The numerical range is the interval defined by the lower and upper bounds numerically reachable by the random variable.

Here, the numerical range of the random variable is defined by one array for each component of the random variable, whose first element is the lower bound of this component while the second one is its upper bound.

abstract property standard_deviation: ndarray: The analytical standard deviation of the random variable.

standard_parameters: dict[str, str] | None: The standard representation of the parameters of the distribution, used for its string representation.

property support: list[ndarray]

The mathematical support.

The mathematical support is the interval defined by the theoretical lower and upper bounds of the random variable.

Here, the mathematical range of the random variable is defined by one array for each component of the random variable, whose first element is the lower bound of this component while the second one is its upper bound.

transformation: str: The transformation applied to the random variable, e.g. ‘sin(x)’.

variable_name: str: The name of the random variable.

Examples¶

See the examples about probability distributions.