# Parametric estimation of statistics¶

In this example, we want to estimate statistics from synthetic data. These data are 500 realizations of x_0, x_1, x_2 and x_3 distributed in the following way:

• x_0: standard uniform distribution,

• x_1: standard normal distribution,

• x_2: standard Weibull distribution,

• x_3: standard exponential distribution.

These samples are generated from the NumPy library.

from __future__ import annotations

from gemseo import configure_logger
from gemseo import create_dataset
from gemseo.uncertainty import create_statistics
from numpy import vstack
from numpy.random import exponential
from numpy.random import normal
from numpy.random import rand
from numpy.random import seed
from numpy.random import weibull

configure_logger()

<RootLogger root (INFO)>


## Create synthetic data¶

seed(0)

n_samples = 500

uniform_rand = rand(n_samples)
normal_rand = normal(size=n_samples)
weibull_rand = weibull(1.5, size=n_samples)
exponential_rand = exponential(size=n_samples)

data = vstack((uniform_rand, normal_rand, weibull_rand, exponential_rand)).T

variables = ["x_0", "x_1", "x_2", "x_3"]

data

array([[ 0.5488135 , -0.98551074,  1.37408242,  1.11379656],
[ 0.71518937, -1.47183501,  2.13236167,  0.63548465],
[ 0.60276338,  1.64813493,  0.52518717,  3.2112956 ],
...,
[ 0.40171354, -0.21252304,  0.30225024,  4.00986833],
[ 0.24841347, -0.76211451,  0.364483  ,  0.55896365],
[ 0.50586638, -0.88778014,  0.82654114,  2.12919171]])


## Create a ParametricStatistics object¶

We create a ParametricStatistics object from this data encapsulated in a Dataset:

dataset = create_dataset("Dataset", data, variables)


and a list of names of candidate probability distributions: exponential, normal and uniform distributions (see ParametricStatistics.DistributionName). We do not use the default fitting criterion (‘BIC’) but ‘Kolmogorov’ (see ParametricStatistics.FittingCriterion and ParametricStatistics.SignificanceTest).

tested_distributions = ["Exponential", "Normal", "Uniform"]
analysis = create_statistics(
dataset, tested_distributions=tested_distributions, fitting_criterion="Kolmogorov"
)
analysis

   INFO - 14:46:54: Create ParametricStatistics(Dataset), a ParametricStatistics library.
INFO - 14:46:54: | Set goodness-of-fit criterion: Kolmogorov.
INFO - 14:46:54: | Set significance level of hypothesis test: 0.05.
INFO - 14:46:54: Fit different distributions (Exponential, Normal, Uniform) per variable and compute the goodness-of-fit criterion.
INFO - 14:46:54: | Fit different distributions for x_0.
INFO - 14:46:54: | Fit different distributions for x_1.
INFO - 14:46:54: | Fit different distributions for x_2.
INFO - 14:46:54: | Fit different distributions for x_3.
INFO - 14:46:54: Select the best distribution for each variable.
INFO - 14:46:54: | The best distribution for x_0 is Uniform([0.00271509,1.00083]).
INFO - 14:46:54: | The best distribution for x_1 is Normal([-0.100117,0.985312]).
WARNING - 14:46:54: All criteria values are lower than the significance level 0.05.
INFO - 14:46:54: | The best distribution for x_2 is Normal([0.9783,0.665983]).
INFO - 14:46:54: | The best distribution for x_3 is Exponential([1.02231,7.35553e-05]).

ParametricStatistics(Dataset)
• n_samples: 500
• n_variables: 4
• variables: x_0, x_1, x_2, x_3

## Get statistics¶

From this ParametricStatistics instance, we can easily get statistics for the different variables based on the selected distributions.

### Get minimum¶

Here is the minimum value for the different variables:

analysis.compute_minimum()

{'x_0': array([0.00271509]), 'x_1': array([-inf]), 'x_2': array([-inf]), 'x_3': array([7.3555332e-05])}


### Get maximum¶

Here is the minimum value for the different variables:

analysis.compute_maximum()

{'x_0': array([1.00082739]), 'x_1': array([inf]), 'x_2': array([inf]), 'x_3': array([inf])}


### Get range¶

Here is the range, i.e. the difference between the minimum and maximum values, for the different variables:

analysis.compute_range()

{'x_0': array([0.99811229]), 'x_1': array([inf]), 'x_2': array([inf]), 'x_3': array([inf])}


### Get mean¶

Here is the mean value for the different variables:

analysis.compute_mean()

{'x_0': array([0.50177124]), 'x_1': array([-0.1001173]), 'x_2': array([0.97829969]), 'x_3': array([0.97825244])}


### Get standard deviation¶

Here is the standard deviation for the different variables:

analysis.compute_standard_deviation()

{'x_0': array([0.2881302]), 'x_1': array([0.98531188]), 'x_2': array([0.66598346]), 'x_3': array([0.97817888])}


### Get variance¶

Here is the variance for the different variables:

analysis.compute_variance()

{'x_0': array([0.08301901]), 'x_1': array([0.9708395]), 'x_2': array([0.44353397]), 'x_3': array([0.95683393])}


### Get quantile¶

Here is the quantile with level 80% for the different variables:

analysis.compute_quantile(0.8)

{'x_0': array([0.80120493]), 'x_1': array([0.72914209]), 'x_2': array([1.53880551]), 'x_3': array([1.57439174])}


### Get quartile¶

Here is the second quartile for the different variables:

analysis.compute_quartile(2)

{'x_0': array([0.50177124]), 'x_1': array([-0.1001173]), 'x_2': array([0.97829969]), 'x_3': array([0.67809549])}


### Get percentile¶

Here is the 50th percentile for the different variables:

analysis.compute_percentile(50)

{'x_0': array([0.50177124]), 'x_1': array([-0.1001173]), 'x_2': array([0.97829969]), 'x_3': array([0.67809549])}


### Get median¶

Here is the median for the different variables:

analysis.compute_median()

{'x_0': array([0.50177124]), 'x_1': array([-0.1001173]), 'x_2': array([0.97829969]), 'x_3': array([0.67809549])}


### Get tolerance interval¶

Here is the two-sided tolerance interval with a coverage level equal to 50% with a confidence level of 95% for the different variables:

analysis.compute_tolerance_interval(0.5)

{'x_0': [Bounds(lower=array([0.2522558]), upper=array([0.75684261]))], 'x_1': [Bounds(lower=array([-0.80205726]), upper=array([0.60182265]))], 'x_2': [Bounds(lower=array([0.50385052]), upper=array([1.45274885]))], 'x_3': [Bounds(lower=array([0.23960073]), upper=array([1.3194909]))]}


### Get B-value¶

Here is the B-value for the different variables, which is a left-sided tolerance interval with a coverage level equal to 90% with a confidence level of 95%:

analysis.compute_b_value()

{'x_0': array([[0.10253656]]), 'x_1': array([[-1.43545972]]), 'x_2': array([[0.07572662]]), 'x_3': array([[0.09277731]])}


Total running time of the script: ( 0 minutes 0.406 seconds)

Gallery generated by Sphinx-Gallery