Note

Go to the end to download the full example code

# Fitting a distribution from data based on OpenTURNS¶

```
from __future__ import annotations
from gemseo import configure_logger
from gemseo.uncertainty.distributions.openturns.fitting import OTDistributionFitter
from numpy.random import randn
from numpy.random import seed
configure_logger()
```

```
<RootLogger root (INFO)>
```

In this example,
we will see how to fit a distribution from data.
For a purely pedagogical reason,
we consider a synthetic dataset made of 100 realizations of *‘X’*,
a random variable distributed according to the standard normal distribution.
These samples are generated from the NumPy library.

```
seed(1)
data = randn(100)
variable_name = "X"
```

## Create a distribution fitter¶

Then,
we create an `OTDistributionFitter`

from these data and this variable name:

```
fitter = OTDistributionFitter(variable_name, data)
```

## Fit a distribution¶

From this distribution fitter, we can easily fit any distribution available in the OpenTURNS library:

```
fitter.available_distributions
```

```
['Arcsine', 'Beta', 'Burr', 'Chi', 'ChiSquare', 'Dirichlet', 'Exponential', 'FisherSnedecor', 'Frechet', 'Gamma', 'GeneralizedPareto', 'Gumbel', 'Histogram', 'InverseNormal', 'Laplace', 'LogNormal', 'LogUniform', 'Logistic', 'MeixnerDistribution', 'Normal', 'Pareto', 'Rayleigh', 'Rice', 'Student', 'Trapezoidal', 'Triangular', 'TruncatedNormal', 'Uniform', 'VonMises', 'WeibullMax', 'WeibullMin']
```

For example, we can fit a normal distribution:

```
norm_dist = fitter.fit("Normal")
norm_dist
```

```
Normal([0.0605829,0.889615])
```

or an exponential one:

```
exp_dist = fitter.fit("Exponential")
exp_dist
```

```
Exponential([0.419342,-2.3241])
```

The returned object is an `OTDistribution`

that we can represent graphically
in terms of probability and cumulative density functions:

```
norm_dist.plot()
```

```
<Figure size 640x320 with 2 Axes>
```

## Measure the goodness-of-fit¶

We can also measure the goodness-of-fit of a distribution by means of a fitting criterion. Some fitting criteria are based on significance tests made of a test statistics, a p-value and a significance level. We can access the names of all the available fitting criteria:

```
fitter.available_criteria
```

```
['BIC', 'ChiSquared', 'Kolmogorov']
```

or only the significance tests

```
fitter.available_significance_tests
```

```
[<SignificanceTest.ChiSquared: 'ChiSquared'>, <SignificanceTest.Kolmogorov: 'Kolmogorov'>]
```

For example, we can measure the goodness-of-fit of the previous distributions by considering the Bayesian information criterion (BIC):

```
quality_measure = fitter.compute_measure(norm_dist, "BIC")
"Normal", quality_measure
quality_measure = fitter.compute_measure(exp_dist, "BIC")
"Exponential", quality_measure
```

```
('Exponential', 3.7381346286469945)
```

Here, the fitted normal distribution is better than the fitted exponential one in terms of BIC. We can also the Kolmogorov fitting criterion which is based on the Kolmogorov significance test:

```
acceptable, details = fitter.compute_measure(norm_dist, "Kolmogorov")
"Normal", acceptable, details
acceptable, details = fitter.compute_measure(exp_dist, "Kolmogorov")
"Exponential", acceptable, details
```

```
('Exponential', False, {'p-value': 5.628454180958696e-11, 'statistics': 0.34248997332293696, 'level': 0.05})
```

In this case,
the `OTDistributionFitter.compute_measure()`

method
returns a tuple with two values:

a boolean indicating if the measured distribution is acceptable to model the data,

a dictionary containing the test statistics, the p-value and the significance level.

Note

We can also change the significance level for significance tests
whose default value is 0.05.
For that, use the `level`

argument.

## Select an optimal distribution¶

Lastly,
we can also select an optimal `OTDistribution`

based on a collection of distributions names,
a fitting criterion,
a significance level
and a selection criterion:

‘best’: select the distribution minimizing (or maximizing, depending on the criterion) the criterion,

‘first’: select the first distribution for which the criterion is greater (or lower, depending on the criterion) than the level.

By default,
the `OTDistributionFitter.select()`

method uses a significance level equal to 0.5
and ‘best’ selection criterion.

```
selected_distribution = fitter.select(["Exponential", "Normal"], "Kolmogorov")
selected_distribution
```

```
Normal([0.0605829,0.889615])
```

**Total running time of the script:** (0 minutes 0.176 seconds)