fitting module¶
Class to fit a distribution from data based on OpenTURNS.
Overview¶
The OTDistributionFitter
class considers several samples
of an uncertain variable, fits a user-defined probability distribution
from this dataset and returns a OTDistribution
.
It can also return a goodness-of-fit measure
associated with this distribution,
e.g. Bayesian Information Criterion, Kolmogorov test or Chi Squared test,
or select an optimal distribution among a collection according to
a criterion with a threshold.
Construction¶
The OTDistributionFitter
of a given uncertain variable is built
from only two arguments:
a variable name,
a one-dimensional numpy array.
Capabilities¶
Fit a distribution¶
The OTDistributionFitter.fit()
method takes a distribution name
recognized by OpenTURNS as argument (e.g. ‘Normal’, ‘Uniform’, ‘Exponential’,
…) as argument and returns an OTDistribution
whose underlying OpenTURNS distribution is the specified one fitted
from the dataset passed to the constructor.
Measure the goodness-of-fit¶
The OTDistributionFitter.measure()
method has two mandatory arguments:
a distribution which is a either a
OTDistribution
or a distribution name from whichfit()
method builds aOTDistribution
,a fitting criterion name.
Note
Use the OTDistributionFitter.get_available_criteria()
method to get
the complete list of available criteria
and the OTDistributionFitter.get_significance_tests()
method
to get the list of available criteria which are significance tests.
The OTDistributionFitter.measure()
method can also use a level
associated with the criterion.
The OTDistributionFitter.measure()
methods returns a goodness-of-fit
measure whose nature is either a scalar
when the criterion is not a significance test
or a tuple when the criterion is a significance test. In that case,
the first component of the tuple is a boolean indicating if the measured
distribution is acceptable to model the data and the second one is
a dictionary containing the test statistics, the p-value and
the significance level.
Select an optimal distribution¶
The OTDistributionFitter.select()
method select aims to select an
optimal distribution among a collection. It uses two mandatory arguments:
a list of distribution, either a list of distributions names or a list of
OTDistribution
,a fitting criterion name.
The OTDistributionFitter.select()
method can also use a level
associated with the criterion and a criterion selection:
‘best’: select the distribution minimizing (or maximizing, depending on the criterion) the criterion,
‘first’: Select the first distribution for which the criterion is greater (or lower, depending on the criterion) than the level.
- class gemseo.uncertainty.distributions.openturns.fitting.OTDistributionFitter(variable, data)[source]¶
Bases:
object
Fit a probabilistic distribution from a data array.
- Parameters
variable (str) – The name of the variable.
data (numpy.ndarray) – A data array.
- Return type
None
- compute_measure(distribution, criterion, level=0.05)[source]¶
Measure the goodness-of-fit of a distribution to data.
- Parameters
distribution (OTDistribution | str) – A distribution name.
criterion (str) – The name of the goodness-of-fit criterion.
level (float) –
A test level, i.e. the risk of committing a Type 1 error, that is an incorrect rejection of a true null hypothesis, for criteria based on test hypothesis.
By default it is set to 0.05.
- Returns
The goodness-of-fit measure.
- Return type
MeasureType
- fit(distribution)[source]¶
Fit a distribution.
- Parameters
distribution (str) – The name of a distribution.
- Returns
The distribution corresponding to the provided name.
- Return type
gemseo.uncertainty.distributions.openturns.distribution.OTDistribution
- select(distributions, fitting_criterion, level=0.05, selection_criterion='best')[source]¶
Select the best distribution from a list of candidates.
- Parameters
distributions (Sequence[str] | Sequence[OTDistribution]) – The distributions.
fitting_criterion (str) – The name of the goodness-of-fit criterion.
level (float) –
A test level, i.e. the risk of committing a Type 1 error, that is an incorrect rejection of a true null hypothesis, for criteria based on test hypothesis.
By default it is set to 0.05.
selection_criterion (str) –
The name of the selection criterion. Either ‘first’ or ‘best’.
By default it is set to best.
- Returns
The best distribution.
- Return type
- classmethod select_from_measures(measures, fitting_criterion, level=0.05, selection_criterion='best')[source]¶
Select the best distribution from measures.
- Parameters
measures (list[Union[Tuple[bool, Mapping[str, float]], float]]) – The measures.
fitting_criterion (str) – The name of the goodness-of-fit criterion.
level (float) –
A test level, i.e. the risk of committing a Type 1 error, that is an incorrect rejection of a true null hypothesis, for criteria based on test hypothesis.
By default it is set to 0.05.
selection_criterion (str) –
The name of the selection criterion. Either ‘first’ or ‘best’.
By default it is set to best.
- Returns
The index of the best distribution.
- Return type
- AVAILABLE_DISTRIBUTIONS = ['Arcsine', 'Beta', 'Burr', 'Chi', 'ChiSquare', 'Dirichlet', 'Exponential', 'FisherSnedecor', 'Frechet', 'Gamma', 'GeneralizedPareto', 'Gumbel', 'Histogram', 'InverseNormal', 'Laplace', 'LogNormal', 'LogUniform', 'Logistic', 'MeixnerDistribution', 'Normal', 'Pareto', 'Rayleigh', 'Rice', 'Student', 'Trapezoidal', 'Triangular', 'TruncatedNormal', 'Uniform', 'VonMises', 'WeibullMax', 'WeibullMin']¶
- AVAILABLE_FITTING_TESTS = ['BIC', 'ChiSquared', 'Kolmogorov']¶
- SIGNIFICANCE_TESTS = ['Kolmogorov', 'ChiSquared']¶
- data: numpy.ndarray¶
The data array.
- dist_name = 'WeibullMin'¶
- factory = class=DistributionFactory implementation=class=WeibullMinFactory¶
- factory_class_name = 'WeibullMinFactory'¶