# Parametric estimation of statistics¶

In this example, we want to estimate statistics from synthetic data. These data are 500 realizations of x_0, x_1, x_2 and x_3 distributed in the following way:

• x_0: standard uniform distribution,

• x_1: standard normal distribution,

• x_2: standard Weibull distribution,

• x_3: standard exponential distribution.

These samples are generated from the NumPy library.

from __future__ import annotations

from gemseo.api import configure_logger
from gemseo.api import create_dataset
from gemseo.uncertainty.api import create_statistics
from numpy import vstack
from numpy.random import exponential
from numpy.random import normal
from numpy.random import rand
from numpy.random import seed
from numpy.random import weibull

configure_logger()

<RootLogger root (INFO)>


## Create synthetic data¶

seed(0)

n_samples = 500

uniform_rand = rand(n_samples)
normal_rand = normal(size=n_samples)
weibull_rand = weibull(1.5, size=n_samples)
exponential_rand = exponential(size=n_samples)

data = vstack((uniform_rand, normal_rand, weibull_rand, exponential_rand)).T

variables = ["x_0", "x_1", "x_2", "x_3"]

print(data)

[[ 0.5488135  -0.98551074  1.37408242  1.11379656]
[ 0.71518937 -1.47183501  2.13236167  0.63548465]
[ 0.60276338  1.64813493  0.52518717  3.2112956 ]
...
[ 0.40171354 -0.21252304  0.30225024  4.00986833]
[ 0.24841347 -0.76211451  0.364483    0.55896365]
[ 0.50586638 -0.88778014  0.82654114  2.12919171]]


## Create a ParametricStatistics object¶

We create a ParametricStatistics object from this data encapsulated in a Dataset:

dataset = create_dataset("Dataset", data, variables)


and a list of names of candidate probability distributions: exponential, normal and uniform distributions (see ParametricStatistics.get_available_distributions()). We do not use the default fitting criterion (‘BIC’) but ‘Kolmogorov’ (see ParametricStatistics.get_available_criteria() and ParametricStatistics.get_significance_tests()).

tested_distributions = ["Exponential", "Normal", "Uniform"]
analysis = create_statistics(
dataset, tested_distributions=tested_distributions, fitting_criterion="Kolmogorov"
)
print(analysis)

    INFO - 17:27:08: Create ParametricStatistics_Dataset, a ParametricStatistics library.
INFO - 17:27:08: | Set goodness-of-fit criterion: Kolmogorov.
INFO - 17:27:08: | Set significance level of hypothesis test: 0.05.
INFO - 17:27:08: Fit different distributions (Exponential, Normal, Uniform) per variable and compute the goodness-of-fit criterion.
INFO - 17:27:08: | Fit different distributions for x_0.
INFO - 17:27:08: | Fit different distributions for x_1.
INFO - 17:27:08: | Fit different distributions for x_2.
INFO - 17:27:08: | Fit different distributions for x_3.
INFO - 17:27:08: Select the best distribution for each variable.
INFO - 17:27:08: | The best distribution for x_0 is Uniform([0.00271509,1.00083]).
INFO - 17:27:08: | The best distribution for x_1 is Normal([-0.100117,0.985312]).
WARNING - 17:27:08: All criteria values are lower than the significance level 0.05.
INFO - 17:27:08: | The best distribution for x_2 is Normal([0.9783,0.665983]).
INFO - 17:27:08: | The best distribution for x_3 is Exponential([1.02231,7.35553e-05]).
ParametricStatistics_Dataset
n_samples: 500
n_variables: 4
variables: x_0, x_1, x_2, x_3


## Get statistics¶

From this ParametricStatistics instance, we can easily get statistics for the different variables based on the selected distributions.

### Get minimum¶

Here is the minimum value for the different variables:

print(analysis.compute_minimum())

{'x_0': array([0.00271509]), 'x_1': array([-inf]), 'x_2': array([-inf]), 'x_3': array([7.3555332e-05])}


### Get maximum¶

Here is the minimum value for the different variables:

print(analysis.compute_maximum())

{'x_0': array([1.00082739]), 'x_1': array([inf]), 'x_2': array([inf]), 'x_3': array([inf])}


### Get range¶

Here is the range, i.e. the difference between the minimum and maximum values, for the different variables:

print(analysis.compute_range())

{'x_0': array([0.99811229]), 'x_1': array([inf]), 'x_2': array([inf]), 'x_3': array([inf])}


### Get mean¶

Here is the mean value for the different variables:

print(analysis.compute_mean())

{'x_0': array([0.50177124]), 'x_1': array([-0.1001173]), 'x_2': array([0.97829969]), 'x_3': array([0.97825244])}


### Get standard deviation¶

Here is the standard deviation for the different variables:

print(analysis.compute_standard_deviation())

{'x_0': array([0.2881302]), 'x_1': array([0.98531188]), 'x_2': array([0.66598346]), 'x_3': array([0.97817888])}


### Get variance¶

Here is the variance for the different variables:

print(analysis.compute_variance())

{'x_0': array([0.08301901]), 'x_1': array([0.9708395]), 'x_2': array([0.44353397]), 'x_3': array([0.95683393])}


### Get quantile¶

Here is the quantile with level 80% for the different variables:

print(analysis.compute_quantile(0.8))

{'x_0': array([0.80120493]), 'x_1': array([0.72914209]), 'x_2': array([1.53880551]), 'x_3': array([1.57439174])}


### Get quartile¶

Here is the second quartile for the different variables:

print(analysis.compute_quartile(2))

{'x_0': array([0.50177124]), 'x_1': array([-0.1001173]), 'x_2': array([0.97829969]), 'x_3': array([0.67809549])}


### Get percentile¶

Here is the 50th percentile for the different variables:

print(analysis.compute_percentile(50))

{'x_0': array([0.50177124]), 'x_1': array([-0.1001173]), 'x_2': array([0.97829969]), 'x_3': array([0.67809549])}


### Get median¶

Here is the median for the different variables:

print(analysis.compute_median())

{'x_0': array([0.50177124]), 'x_1': array([-0.1001173]), 'x_2': array([0.97829969]), 'x_3': array([0.67809549])}


### Get tolerance interval¶

Here is the two-sided tolerance interval with a coverage level equal to 50% with a confidence level of 95% for the different variables:

print(analysis.compute_tolerance_interval(0.5))

{'x_0': (array([0.2522558]), array([0.75684261])), 'x_1': (array([-0.80205726]), array([0.60182265])), 'x_2': (array([0.50385052]), array([1.45274885])), 'x_3': (array([0.23960073]), array([1.3194909]))}


### Get B-value¶

Here is the B-value for the different variables, which is a left-sided tolerance interval with a coverage level equal to 90% with a confidence level of 95%:

print(analysis.compute_b_value())

{'x_0': array([0.10253656]), 'x_1': array([-1.43545972]), 'x_2': array([0.07572662]), 'x_3': array([0.09277731])}


Total running time of the script: ( 0 minutes 0.503 seconds)

Gallery generated by Sphinx-Gallery