.. DO NOT EDIT. .. THIS FILE WAS AUTOMATICALLY GENERATED BY SPHINX-GALLERY. .. TO MAKE CHANGES, EDIT THE SOURCE PYTHON FILE: .. "examples/uncertainty/distributions/plot_ot_distfactory.py" .. LINE NUMBERS ARE GIVEN BELOW. .. only:: html .. note:: :class: sphx-glr-download-link-note Click :ref:`here ` to download the full example code .. rst-class:: sphx-glr-example-title .. _sphx_glr_examples_uncertainty_distributions_plot_ot_distfactory.py: Fitting a distribution from data based on OpenTURNS =================================================== .. GENERATED FROM PYTHON SOURCE LINES 27-35 .. code-block:: default from matplotlib import pyplot as plt from numpy.random import randn, seed from gemseo.api import configure_logger from gemseo.uncertainty.distributions.openturns.fitting import OTDistributionFitter configure_logger() .. rst-class:: sphx-glr-script-out Out: .. code-block:: none .. GENERATED FROM PYTHON SOURCE LINES 36-42 In this example, we will see how to fit a distribution from data. For a purely pedagogical reason, we consider a synthetic dataset made of 100 realizations of *'X'*, a random variable distributed according to the standard normal distribution. These samples are generated from the NumPy library. .. GENERATED FROM PYTHON SOURCE LINES 42-46 .. code-block:: default seed(1) data = randn(100) variable_name = "X" .. GENERATED FROM PYTHON SOURCE LINES 47-51 Create a distribution fitter ---------------------------- Then, we create an :class:`.OTDistributionFitter` from these data and this variable name: .. GENERATED FROM PYTHON SOURCE LINES 51-53 .. code-block:: default fitter = OTDistributionFitter(variable_name, data) .. GENERATED FROM PYTHON SOURCE LINES 54-58 Fit a distribution ------------------ From this distribution fitter, we can easily fit any distribution available in the OpenTURNS library: .. GENERATED FROM PYTHON SOURCE LINES 58-60 .. code-block:: default print(fitter.available_distributions) .. rst-class:: sphx-glr-script-out Out: .. code-block:: none ['Arcsine', 'Beta', 'Burr', 'Chi', 'ChiSquare', 'Dirichlet', 'Exponential', 'FisherSnedecor', 'Frechet', 'Gamma', 'GeneralizedPareto', 'Gumbel', 'Histogram', 'InverseNormal', 'Laplace', 'LogNormal', 'LogUniform', 'Logistic', 'MeixnerDistribution', 'Normal', 'Pareto', 'Rayleigh', 'Rice', 'Student', 'Trapezoidal', 'Triangular', 'TruncatedNormal', 'Uniform', 'VonMises', 'WeibullMax', 'WeibullMin'] .. GENERATED FROM PYTHON SOURCE LINES 61-63 For example, we can fit a normal distribution: .. GENERATED FROM PYTHON SOURCE LINES 63-66 .. code-block:: default norm_dist = fitter.fit("Normal") print(norm_dist) .. rst-class:: sphx-glr-script-out Out: .. code-block:: none Normal(class=Point name=Unnamed dimension=2 values=[0.0605829,0.889615]) .. GENERATED FROM PYTHON SOURCE LINES 67-68 or an exponential one: .. GENERATED FROM PYTHON SOURCE LINES 68-71 .. code-block:: default exp_dist = fitter.fit("Exponential") print(exp_dist) .. rst-class:: sphx-glr-script-out Out: .. code-block:: none Exponential(class=Point name=Unnamed dimension=2 values=[0.419342,-2.3241]) .. GENERATED FROM PYTHON SOURCE LINES 72-75 The returned object is an :class:`.OTDistribution` that we can represent graphically in terms of probability and cumulative density functions: .. GENERATED FROM PYTHON SOURCE LINES 75-79 .. code-block:: default norm_dist.plot(show=False) # Workaround for HTML rendering, instead of ``show=True`` plt.show() .. image:: /examples/uncertainty/distributions/images/sphx_glr_plot_ot_distfactory_001.png :alt: Probability distribution of X, PDF, Cumulative density function :class: sphx-glr-single-img .. GENERATED FROM PYTHON SOURCE LINES 80-87 Measure the goodness-of-fit --------------------------- We can also measure the goodness-of-fit of a distribution by means of a fitting criterion. Some fitting criteria are based on significance tests made of a test statistics, a p-value and a significance level. We can access the names of the available fitting criteria: .. GENERATED FROM PYTHON SOURCE LINES 87-90 .. code-block:: default print(fitter.available_criteria) print(fitter.available_significance_tests) .. rst-class:: sphx-glr-script-out Out: .. code-block:: none ['BIC', 'ChiSquared', 'Kolmogorov'] ['ChiSquared', 'Kolmogorov'] .. GENERATED FROM PYTHON SOURCE LINES 91-95 For example, we can measure the goodness-of-fit of the previous distributions by considering the `Bayesian information criterion (BIC) `_: .. GENERATED FROM PYTHON SOURCE LINES 95-101 .. code-block:: default quality_measure = fitter.compute_measure(norm_dist, "BIC") print("Normal: ", quality_measure) quality_measure = fitter.compute_measure(exp_dist, "BIC") print("Exponential: ", quality_measure) .. rst-class:: sphx-glr-script-out Out: .. code-block:: none Normal: 2.5939451287694295 Exponential: 3.7381346286469945 .. GENERATED FROM PYTHON SOURCE LINES 102-107 Here, the fitted normal distribution is better than the fitted exponential one in terms of BIC. We can also the Kolmogorov fitting criterion which is based on the Kolmogorov significance test: .. GENERATED FROM PYTHON SOURCE LINES 107-112 .. code-block:: default acceptable, details = fitter.compute_measure(norm_dist, "Kolmogorov") print("Normal: ", acceptable, details) acceptable, details = fitter.compute_measure(exp_dist, "Kolmogorov") print("Exponential: ", acceptable, details) .. rst-class:: sphx-glr-script-out Out: .. code-block:: none Normal: True {'p-value': 0.9879299613543082, 'statistics': 0.04330972976650932, 'level': 0.05} Exponential: False {'p-value': 5.628454180958696e-11, 'statistics': 0.34248997332293696, 'level': 0.05} .. GENERATED FROM PYTHON SOURCE LINES 113-125 In this case, the :meth:`.OTDistributionFitter.measure` method returns a tuple with two values: 1. a boolean indicating if the measured distribution is acceptable to model the data, 2. a dictionary containing the test statistics, the p-value and the significance level. .. note:: We can also change the significance level for significance tests whose default value is 0.05. For that, use the :code:`level` argument. .. GENERATED FROM PYTHON SOURCE LINES 127-145 Select an optimal distribution ------------------------------ Lastly, we can also select an optimal :class:`.OTDistribution` based on a collection of distributions names, a fitting criterion, a significance level and a selection criterion: - 'best': select the distribution minimizing (or maximizing, depending on the criterion) the criterion, - 'first': select the first distribution for which the criterion is greater (or lower, depending on the criterion) than the level. By default, the :meth:`.OTDistributionFitter.select` method uses a significance level equal to 0.5 and 'best' selection criterion. .. GENERATED FROM PYTHON SOURCE LINES 145-147 .. code-block:: default selected_distribution = fitter.select(["Exponential", "Normal"], "Kolmogorov") print(selected_distribution) .. rst-class:: sphx-glr-script-out Out: .. code-block:: none Normal(class=Point name=Unnamed dimension=2 values=[0.0605829,0.889615]) .. rst-class:: sphx-glr-timing **Total running time of the script:** ( 0 minutes 0.171 seconds) .. _sphx_glr_download_examples_uncertainty_distributions_plot_ot_distfactory.py: .. only :: html .. container:: sphx-glr-footer :class: sphx-glr-footer-example .. container:: sphx-glr-download sphx-glr-download-python :download:`Download Python source code: plot_ot_distfactory.py ` .. container:: sphx-glr-download sphx-glr-download-jupyter :download:`Download Jupyter notebook: plot_ot_distfactory.ipynb ` .. only:: html .. rst-class:: sphx-glr-signature `Gallery generated by Sphinx-Gallery `_