gemseo.uncertainty.distributions.openturns.fitting module#
Class to fit a distribution from data based on OpenTURNS.
Overview#
The OTDistributionFitter
class considers several samples
of an uncertain variable, fits a user-defined probability distribution
from this dataset and returns an OTDistribution
.
It can also return a goodness-of-fit measure
associated with this distribution,
e.g. Bayesian Information Criterion, Kolmogorov test or Chi Squared test,
or select an optimal distribution among a collection according to
a criterion with a threshold.
Construction#
The OTDistributionFitter
of a given uncertain variable is built
from only two arguments:
a variable name,
a one-dimensional numpy array.
Capabilities#
Fit a distribution#
The OTDistributionFitter.fit()
method takes a distribution name
recognized by OpenTURNS as argument (e.g. 'Normal', 'Uniform', 'Exponential',
...) as argument and returns an OTDistribution
whose underlying OpenTURNS distribution is the specified one fitted
from the dataset passed to the constructor.
Measure the goodness-of-fit#
The OTDistributionFitter.measure()
method has two mandatory arguments:
a distribution which is either an
OTDistribution
or a distribution name from whichfit()
method builds anOTDistribution
,a fitting criterion name.
Note
Use the OTDistributionFitter.get_available_criteria()
method to get
the complete list of available criteria
and the OTDistributionFitter.get_significance_tests()
method
to get the list of available criteria which are significance tests.
The OTDistributionFitter.measure()
method can also use a level
associated with the criterion.
The OTDistributionFitter.measure()
methods returns a goodness-of-fit
measure whose nature is either a scalar
when the criterion is not a significance test
or a tuple when the criterion is a significance test. In that case,
the first component of the tuple is a boolean indicating if the measured
distribution is acceptable to model the data and the second one is
a dictionary containing the test statistics, the p-value and
the significance level.
Select an optimal distribution#
The OTDistributionFitter.select()
method select aims to select an
optimal distribution among a collection. It uses two mandatory arguments:
a list of distribution, either a list of distributions names or a list of
OTDistribution
,a fitting criterion name.
The OTDistributionFitter.select()
method can also use a level
associated with the criterion and a criterion selection:
'best': select the distribution minimizing (or maximizing, depending on the criterion) the criterion,
'first': Select the first distribution for which the criterion is greater (or lower, depending on the criterion) than the level.
- class OTDistributionFitter(variable, data)[source]#
Bases:
object
Fit a probabilistic distribution from a data array.
- Parameters:
variable (str) -- The name of the variable.
data (RealArray) -- A data array.
- class DistributionName(value)#
Bases:
StrEnum
The available probability distributions.
- Arcsine = 'Arcsine'#
- Beta = 'Beta'#
- Burr = 'Burr'#
- Chi = 'Chi'#
- ChiSquare = 'ChiSquare'#
- Dirichlet = 'Dirichlet'#
- Exponential = 'Exponential'#
- FisherSnedecor = 'FisherSnedecor'#
- Frechet = 'Frechet'#
- Gamma = 'Gamma'#
- GeneralizedPareto = 'GeneralizedPareto'#
- Gumbel = 'Gumbel'#
- Histogram = 'Histogram'#
- InverseNormal = 'InverseNormal'#
- Laplace = 'Laplace'#
- LogNormal = 'LogNormal'#
- LogUniform = 'LogUniform'#
- Logistic = 'Logistic'#
- MeixnerDistribution = 'MeixnerDistribution'#
- Normal = 'Normal'#
- Pareto = 'Pareto'#
- Rayleigh = 'Rayleigh'#
- Rice = 'Rice'#
- Student = 'Student'#
- Trapezoidal = 'Trapezoidal'#
- Triangular = 'Triangular'#
- TruncatedNormal = 'TruncatedNormal'#
- Uniform = 'Uniform'#
- VonMises = 'VonMises'#
- WeibullMax = 'WeibullMax'#
- WeibullMin = 'WeibullMin'#
- class FittingCriterion(value)#
Bases:
StrEnum
The available fitting criteria.
- BIC = 'BIC'#
- ChiSquared = 'ChiSquared'#
- Kolmogorov = 'Kolmogorov'#
- class SelectionCriterion(value)#
Bases:
LowercaseStrEnum
The different selection criteria.
- BEST = 'best'#
- FIRST = 'first'#
- class SignificanceTest(value)#
Bases:
StrEnum
The available significance tests.
- ChiSquared = 'ChiSquared'#
- Kolmogorov = 'Kolmogorov'#
- compute_measure(distribution, criterion, level=0.05)[source]#
Measure the goodness-of-fit of a distribution to data.
- Parameters:
distribution (OTDistribution | DistributionName) -- A distribution name.
criterion (FittingCriterion) -- The name of the goodness-of-fit criterion.
level (float) --
A test level, i.e. the risk of committing a Type 1 error, that is an incorrect rejection of a true null hypothesis, for criteria based on test hypothesis.
By default it is set to 0.05.
- Returns:
The goodness-of-fit measure.
- Return type:
MeasureType
- fit(distribution)[source]#
Fit a distribution.
- Parameters:
distribution (DistributionName) -- The name of a distribution.
- Returns:
The distribution corresponding to the provided name.
- Return type:
- select(distributions, fitting_criterion, level=0.05, selection_criterion=SelectionCriterion.BEST)[source]#
Select the best distribution from a list of candidates.
- Parameters:
distributions (MutableSequence[DistributionName | OTDistribution]) -- The distributions.
fitting_criterion (FittingCriterion) -- The goodness-of-fit criterion.
level (float) --
A test level, i.e. the risk of committing a Type 1 error, that is an incorrect rejection of a true null hypothesis, for criteria based on test hypothesis.
By default it is set to 0.05.
selection_criterion (SelectionCriterion) --
The selection criterion.
By default it is set to "best".
- Returns:
The best distribution.
- Return type:
- classmethod select_from_measures(measures, fitting_criterion, level=0.05, selection_criterion=SelectionCriterion.BEST)[source]#
Select the best distribution from measures.
- Parameters:
measures (MutableSequence[MeasureType]) -- The measures.
fitting_criterion (FittingCriterion) -- The goodness-of-fit criterion.
level (float) --
A test level, i.e. the risk of committing a Type 1 error, that is an incorrect rejection of a true null hypothesis, for criteria based on test hypothesis.
By default it is set to 0.05.
selection_criterion (SelectionCriterion) --
The selection criterion.
By default it is set to "best".
- Returns:
The index of the best distribution.
- Return type:
- data: RealArray#
The data array.