gemseo / uncertainty / distributions

ot_fdist module

Fitting a distribution from data based on OpenTURNS

Overview

The OTDistributionFitter class considers several samples of an uncertain variable, fits a user-defined probability distribution from this dataset and returns a OTDistribution. It can also return a goodness-of-fit measure associated with this distribution, e.g. Bayesian Information Criterion, Kolmogorov test or Chi Squared test, or select an optimal distribution among a collection according to a criterion with a threshold.

Construction

The OTDistributionFitter of a given uncertain variable is built from only two arguments:

  • a variable name,

  • a one-dimensional numpy array.

Capabilities

Fit a distribution

The OTDistributionFitter.fit() method takes a distribution name recognized by OpenTURNS as argument (e.g. ‘Normal’, ‘Uniform’, ‘Exponential’, …) as argument and returns an OTDistribution whose underlying OpenTURNS distribution is the specified one fitted from the dataset passed to the constructor.

Measure the goodness-of-fit

The OTDistributionFitter.measure() method has two mandatory arguments:

  • a distribution which is a either a OTDistribution or a distribution name from which fit() method builds a OTDistribution,

  • a fitting criterion name.

Note

Use the OTDistributionFitter.get_available_criteria() method to get the complete list of available criteria and the OTDistributionFitter.get_significance_tests() method to get the list of available criteria which are significance tests.

The OTDistributionFitter.measure() method can also use a level associated with the criterion.

The OTDistributionFitter.measure() methods returns a goodness-of-fit measure whose nature is either a scalar when the criterion is not a significance test or a tuple when the criterion is a significance test. In that case, the first component of the tuple is a boolean indicating if the measured distribution is acceptable to model the data and the second one is a dictionary containing the test statistics, the p-value and the significance level.

Select an optimal distribution

The OTDistributionFitter.select() method select aims to select an optimal distribution among a collection. It uses two mandatory arguments:

  • a list of distribution, either a list of distributions names or a list of OTDistribution,

  • a fitting criterion name.

The OTDistributionFitter.select() method can also use a level associated with the criterion and a criterion selection:

  • ‘best’: select the distribution minimizing (or maximizing, depending on the criterion) the criterion,

  • ‘first’: Select the first distribution for which the criterion is greater (or lower, depending on the criterion) than the level.

class gemseo.uncertainty.distributions.ot_fdist.OTDistributionFitter(variable, data)[source]

Bases: object

OpenTURNS distribution fitter.

Constructor.

Parameters
  • variable (str) – variable name.

  • data (array) – data.

AVAILABLE_FACTORIES = {'Arcsine': <class 'openturns.dist_bundle1.ArcsineFactory'>, 'Beta': <class 'openturns.dist_bundle1.BetaFactory'>, 'Burr': <class 'openturns.dist_bundle1.BurrFactory'>, 'Chi': <class 'openturns.dist_bundle1.ChiFactory'>, 'ChiSquare': <class 'openturns.dist_bundle1.ChiSquareFactory'>, 'Dirichlet': <class 'openturns.dist_bundle1.DirichletFactory'>, 'Exponential': <class 'openturns.dist_bundle1.ExponentialFactory'>, 'FisherSnedecor': <class 'openturns.dist_bundle1.FisherSnedecorFactory'>, 'Frechet': <class 'openturns.dist_bundle1.FrechetFactory'>, 'Gamma': <class 'openturns.dist_bundle1.GammaFactory'>, 'GeneralizedPareto': <class 'openturns.dist_bundle1.GeneralizedParetoFactory'>, 'Gumbel': <class 'openturns.dist_bundle1.GumbelFactory'>, 'Histogram': <class 'openturns.dist_bundle2.HistogramFactory'>, 'InverseNormal': <class 'openturns.dist_bundle2.InverseNormalFactory'>, 'Laplace': <class 'openturns.dist_bundle2.LaplaceFactory'>, 'LogNormal': <class 'openturns.dist_bundle2.LogNormalFactory'>, 'LogUniform': <class 'openturns.dist_bundle2.LogUniformFactory'>, 'Logistic': <class 'openturns.dist_bundle2.LogisticFactory'>, 'MeixnerDistribution': <class 'openturns.dist_bundle2.MeixnerDistributionFactory'>, 'Normal': <class 'openturns.dist_bundle2.NormalFactory'>, 'Pareto': <class 'openturns.dist_bundle1.ParetoFactory'>, 'Rayleigh': <class 'openturns.dist_bundle3.RayleighFactory'>, 'Rice': <class 'openturns.dist_bundle3.RiceFactory'>, 'Student': <class 'openturns.dist_bundle3.StudentFactory'>, 'Trapezoidal': <class 'openturns.dist_bundle3.TrapezoidalFactory'>, 'Triangular': <class 'openturns.dist_bundle3.TriangularFactory'>, 'TruncatedNormal': <class 'openturns.dist_bundle3.TruncatedNormalFactory'>, 'Uniform': <class 'openturns.dist_bundle3.UniformFactory'>, 'WeibullMax': <class 'openturns.dist_bundle1.WeibullMaxFactory'>, 'WeibullMin': <class 'openturns.dist_bundle3.WeibullMinFactory'>}
AVAILABLE_FITTING_TESTS = {'BIC': <function FittingTest.BIC>, 'ChiSquared': <function FittingTest.ChiSquared>, 'Kolmogorov': <function FittingTest.Kolmogorov>}
CRITERIA_TO_MAXIMIZE = []
CRITERIA_TO_MINIMIZE = ['BIC']
SIGNIFICANCE_TESTS = ['Kolmogorov', 'ChiSquared']
dist_name = 'WeibullMin'
factory = class=DistributionFactory implementation=class=WeibullMinFactory
factory_class_name = 'WeibullMinFactory'
fit(distribution)[source]

Fit a distribution.

Parameters

distribution (str) – distribution name.

get_available_criteria()[source]

Get available goodness-of-fit criteria.

get_available_distributions()[source]

Get available distributions.

get_significance_tests()[source]

Get significance tests.

measure(distribution, criterion, level=0.05)[source]

Measure the goodness-of-fit of a distribution to data.

Parameters
  • distribution (OTDistribution or str) – distribution.

  • fitting_criterion (str) – goodness-of-fit criterion.

  • level (float) – risk of committing a Type 1 error, that is an incorrect rejection of a true null hypothesis, for criteria based on test hypothesis. Default: 0.05.

select(distributions, fitting_criterion, level=0.05, selection_criterion='best')[source]

Select the best distribution.

Parameters
  • distributions – list of distributions.

  • fitting_criterion (str) – goodness-of-fit criterion.

  • level (float) – significance level. For hypothesis tests, this is the risk of committing a Type 1 error, that is an incorrect rejection of a true null hypothesis. For other tests, this is a threshold. Default: 0.05.

  • selection_criterion (str) – selection criterion

classmethod select_from_results(results, fitting_criterion, level=0.05, selection_criterion='best')[source]

Select the best distribution from results

Parameters
  • results (list) – results

  • fitting_criterion (str) – goodness-of-fit criterion.

  • level (float) – significance level. For hypothesis tests, this is the risk of committing a Type 1 error, that is an incorrect rejection of a true null hypothesis. For other tests, this is a threshold. Default: 0.05.

  • selection_criterion (str) – selection criterion