gemseo.uncertainty.distributions.openturns.fitting module#

Class to fit a distribution from data based on OpenTURNS.

Overview#

The OTDistributionFitter class considers several samples of an uncertain variable, fits a user-defined probability distribution from this dataset and returns an OTDistribution. It can also return a goodness-of-fit measure associated with this distribution, e.g. Bayesian Information Criterion, Kolmogorov test or Chi Squared test, or select an optimal distribution among a collection according to a criterion with a threshold.

Construction#

The OTDistributionFitter of a given uncertain variable is built from only two arguments:

  • a variable name,

  • a one-dimensional numpy array.

Capabilities#

Fit a distribution#

The OTDistributionFitter.fit() method takes a distribution name recognized by OpenTURNS as argument (e.g. 'Normal', 'Uniform', 'Exponential', ...) as argument and returns an OTDistribution whose underlying OpenTURNS distribution is the specified one fitted from the dataset passed to the constructor.

Measure the goodness-of-fit#

The OTDistributionFitter.measure() method has two mandatory arguments:

  • a distribution which is either an OTDistribution or a distribution name from which fit() method builds an OTDistribution,

  • a fitting criterion name.

Note

Use the OTDistributionFitter.get_available_criteria() method to get the complete list of available criteria and the OTDistributionFitter.get_significance_tests() method to get the list of available criteria which are significance tests.

The OTDistributionFitter.measure() method can also use a level associated with the criterion.

The OTDistributionFitter.measure() methods returns a goodness-of-fit measure whose nature is either a scalar when the criterion is not a significance test or a tuple when the criterion is a significance test. In that case, the first component of the tuple is a boolean indicating if the measured distribution is acceptable to model the data and the second one is a dictionary containing the test statistics, the p-value and the significance level.

Select an optimal distribution#

The OTDistributionFitter.select() method select aims to select an optimal distribution among a collection. It uses two mandatory arguments:

  • a list of distribution, either a list of distributions names or a list of OTDistribution,

  • a fitting criterion name.

The OTDistributionFitter.select() method can also use a level associated with the criterion and a criterion selection:

  • 'best': select the distribution minimizing (or maximizing, depending on the criterion) the criterion,

  • 'first': Select the first distribution for which the criterion is greater (or lower, depending on the criterion) than the level.

class OTDistributionFitter(variable, data)[source]#

Bases: object

Fit a probabilistic distribution from a data array.

Parameters:
  • variable (str) -- The name of the variable.

  • data (RealArray) -- A data array.

class DistributionName(value)#

Bases: StrEnum

The available probability distributions.

Arcsine = 'Arcsine'#
Beta = 'Beta'#
Burr = 'Burr'#
Chi = 'Chi'#
ChiSquare = 'ChiSquare'#
Dirichlet = 'Dirichlet'#
Exponential = 'Exponential'#
FisherSnedecor = 'FisherSnedecor'#
Frechet = 'Frechet'#
Gamma = 'Gamma'#
GeneralizedPareto = 'GeneralizedPareto'#
Gumbel = 'Gumbel'#
Histogram = 'Histogram'#
InverseNormal = 'InverseNormal'#
Laplace = 'Laplace'#
LogNormal = 'LogNormal'#
LogUniform = 'LogUniform'#
Logistic = 'Logistic'#
MeixnerDistribution = 'MeixnerDistribution'#
Normal = 'Normal'#
Pareto = 'Pareto'#
Rayleigh = 'Rayleigh'#
Rice = 'Rice'#
Student = 'Student'#
Trapezoidal = 'Trapezoidal'#
Triangular = 'Triangular'#
TruncatedNormal = 'TruncatedNormal'#
Uniform = 'Uniform'#
VonMises = 'VonMises'#
WeibullMax = 'WeibullMax'#
WeibullMin = 'WeibullMin'#
class FittingCriterion(value)#

Bases: StrEnum

The available fitting criteria.

BIC = 'BIC'#
ChiSquared = 'ChiSquared'#
Kolmogorov = 'Kolmogorov'#
class SelectionCriterion(value)#

Bases: LowercaseStrEnum

The different selection criteria.

BEST = 'best'#
FIRST = 'first'#
class SignificanceTest(value)#

Bases: StrEnum

The available significance tests.

ChiSquared = 'ChiSquared'#
Kolmogorov = 'Kolmogorov'#
compute_measure(distribution, criterion, level=0.05)[source]#

Measure the goodness-of-fit of a distribution to data.

Parameters:
  • distribution (OTDistribution | DistributionName) -- A distribution name.

  • criterion (FittingCriterion) -- The name of the goodness-of-fit criterion.

  • level (float) --

    A test level, i.e. the risk of committing a Type 1 error, that is an incorrect rejection of a true null hypothesis, for criteria based on test hypothesis.

    By default it is set to 0.05.

Returns:

The goodness-of-fit measure.

Return type:

MeasureType

fit(distribution)[source]#

Fit a distribution.

Parameters:

distribution (DistributionName) -- The name of a distribution.

Returns:

The distribution corresponding to the provided name.

Return type:

OTDistribution

select(distributions, fitting_criterion, level=0.05, selection_criterion=SelectionCriterion.BEST)[source]#

Select the best distribution from a list of candidates.

Parameters:
  • distributions (MutableSequence[DistributionName | OTDistribution]) -- The distributions.

  • fitting_criterion (FittingCriterion) -- The goodness-of-fit criterion.

  • level (float) --

    A test level, i.e. the risk of committing a Type 1 error, that is an incorrect rejection of a true null hypothesis, for criteria based on test hypothesis.

    By default it is set to 0.05.

  • selection_criterion (SelectionCriterion) --

    The selection criterion.

    By default it is set to "best".

Returns:

The best distribution.

Return type:

OTDistribution

classmethod select_from_measures(measures, fitting_criterion, level=0.05, selection_criterion=SelectionCriterion.BEST)[source]#

Select the best distribution from measures.

Parameters:
  • measures (MutableSequence[MeasureType]) -- The measures.

  • fitting_criterion (FittingCriterion) -- The goodness-of-fit criterion.

  • level (float) --

    A test level, i.e. the risk of committing a Type 1 error, that is an incorrect rejection of a true null hypothesis, for criteria based on test hypothesis.

    By default it is set to 0.05.

  • selection_criterion (SelectionCriterion) --

    The selection criterion.

    By default it is set to "best".

Returns:

The index of the best distribution.

Return type:

int

property available_criteria: list[str]#

The available goodness-of-fit criteria.

property available_distributions: list[str]#

The available distributions.

property available_significance_tests: list[str]#

The significance tests.

data: RealArray#

The data array.

variable: str#

The name of the variable.