gemseo / uncertainty / distributions / openturns

# fitting module¶

Class to fit a distribution from data based on OpenTURNS.

## Overview¶

The OTDistributionFitter class considers several samples of an uncertain variable, fits a user-defined probability distribution from this dataset and returns a OTDistribution. It can also return a goodness-of-fit measure associated with this distribution, e.g. Bayesian Information Criterion, Kolmogorov test or Chi Squared test, or select an optimal distribution among a collection according to a criterion with a threshold.

## Construction¶

The OTDistributionFitter of a given uncertain variable is built from only two arguments:

• a variable name,

• a one-dimensional numpy array.

## Capabilities¶

### Fit a distribution¶

The OTDistributionFitter.fit() method takes a distribution name recognized by OpenTURNS as argument (e.g. ‘Normal’, ‘Uniform’, ‘Exponential’, …) as argument and returns an OTDistribution whose underlying OpenTURNS distribution is the specified one fitted from the dataset passed to the constructor.

### Measure the goodness-of-fit¶

The OTDistributionFitter.measure() method has two mandatory arguments:

Note

Use the OTDistributionFitter.get_available_criteria() method to get the complete list of available criteria and the OTDistributionFitter.get_significance_tests() method to get the list of available criteria which are significance tests.

The OTDistributionFitter.measure() method can also use a level associated with the criterion.

The OTDistributionFitter.measure() methods returns a goodness-of-fit measure whose nature is either a scalar when the criterion is not a significance test or a tuple when the criterion is a significance test. In that case, the first component of the tuple is a boolean indicating if the measured distribution is acceptable to model the data and the second one is a dictionary containing the test statistics, the p-value and the significance level.

### Select an optimal distribution¶

The OTDistributionFitter.select() method select aims to select an optimal distribution among a collection. It uses two mandatory arguments:

The OTDistributionFitter.select() method can also use a level associated with the criterion and a criterion selection:

• ‘best’: select the distribution minimizing (or maximizing, depending on the criterion) the criterion,

• ‘first’: Select the first distribution for which the criterion is greater (or lower, depending on the criterion) than the level.

class gemseo.uncertainty.distributions.openturns.fitting.OTDistributionFitter(variable, data)[source]

Bases: object

Fit a probabilistic distribution from a data array.

Parameters:
• variable (str) – The name of the variable.

• data (ndarray) – A data array.

compute_measure(distribution, criterion, level=0.05)[source]

Measure the goodness-of-fit of a distribution to data.

Parameters:
• distribution (OTDistribution | str) – A distribution name.

• criterion (str) – The name of the goodness-of-fit criterion.

• level (float) –

A test level, i.e. the risk of committing a Type 1 error, that is an incorrect rejection of a true null hypothesis, for criteria based on test hypothesis.

By default it is set to 0.05.

Returns:

The goodness-of-fit measure.

Return type:

MeasureType

fit(distribution)[source]

Fit a distribution.

Parameters:

distribution (str) – The name of a distribution.

Returns:

The distribution corresponding to the provided name.

Return type:

OTDistribution

select(distributions, fitting_criterion, level=0.05, selection_criterion='best')[source]

Select the best distribution from a list of candidates.

Parameters:
• distributions (Sequence[str] | Sequence[OTDistribution]) – The distributions.

• fitting_criterion (str) – The name of the goodness-of-fit criterion.

• level (float) –

A test level, i.e. the risk of committing a Type 1 error, that is an incorrect rejection of a true null hypothesis, for criteria based on test hypothesis.

By default it is set to 0.05.

• selection_criterion (str) –

The name of the selection criterion. Either ‘first’ or ‘best’.

By default it is set to “best”.

Returns:

The best distribution.

Return type:

OTDistribution

classmethod select_from_measures(measures, fitting_criterion, level=0.05, selection_criterion='best')[source]

Select the best distribution from measures.

Parameters:
• measures (list[Union[Tuple[bool, Mapping[str, float]], float]]) – The measures.

• fitting_criterion (str) – The name of the goodness-of-fit criterion.

• level (float) –

A test level, i.e. the risk of committing a Type 1 error, that is an incorrect rejection of a true null hypothesis, for criteria based on test hypothesis.

By default it is set to 0.05.

• selection_criterion (str) –

The name of the selection criterion. Either ‘first’ or ‘best’.

By default it is set to “best”.

Returns:

The index of the best distribution.

Return type:

int

AVAILABLE_DISTRIBUTIONS = ['Arcsine', 'Beta', 'Burr', 'Chi', 'ChiSquare', 'Dirichlet', 'Exponential', 'FisherSnedecor', 'Frechet', 'Gamma', 'GeneralizedPareto', 'Gumbel', 'Histogram', 'InverseNormal', 'Laplace', 'LogNormal', 'LogUniform', 'Logistic', 'MeixnerDistribution', 'Normal', 'Pareto', 'Rayleigh', 'Rice', 'Student', 'Trapezoidal', 'Triangular', 'TruncatedNormal', 'Uniform', 'VonMises', 'WeibullMax', 'WeibullMin']
AVAILABLE_FITTING_TESTS = ['BIC', 'ChiSquared', 'Kolmogorov']
SIGNIFICANCE_TESTS = ['Kolmogorov', 'ChiSquared']
property available_criteria: list[str]

The available goodness-of-fit criteria.

property available_distributions: list[str]

The available distributions.

property available_significance_tests: list[str]

The significance tests.

data: ndarray

The data array.

dist_name = 'WeibullMin'
factory = class=DistributionFactory implementation=class=WeibullMinFactory
factory_class_name = 'WeibullMinFactory'
variable: str

The name of the variable.

## Examples using OTDistributionFitter¶

Fitting a distribution from data based on OpenTURNS

Fitting a distribution from data based on OpenTURNS