gemseo.uncertainty.sensitivity.sobol_analysis module#

Class for the estimation of Sobol' indices.

Let us consider the model \(Y=f(X_1,\ldots,X_d)\) where:

  • \(X_1,\ldots,X_d\) are independent random variables,

  • \(E\left[f(X_1,\ldots,X_d)^2\right]<\infty\).

Then, the following decomposition is unique:

\[Y=f_0 + \sum_{i=1}^df_i(X_i) + \sum_{i,j=1\atop i\neq j}^d f_{i,j}(X_i,X_j) + \sum_{i,j,k=1\atop i\neq j\neq k}^d f_{i,j,k}(X_i,X_j,X_k) + \ldots + f_{1,\ldots,d}(X_1,\ldots,X_d)\]

where:

  • \(f_0=E[Y]\),

  • \(f_i(X_i)=E[Y|X_i]-f_0\),

  • \(f_{i,j}(X_i,X_j)=E[Y|X_i,X_j]-f_i(X_i)-f_j(X_j)-f_0\)

  • and so on.

Then, the shift to variance leads to:

\[V[Y]=\sum_{i=1}^dV\left[f_i(X_i)\right] + \sum_{i,j=1\atop j\neq i}^d V\left[f_{i,j}(X_i,X_j)\right] + \ldots + V\left[f_{1,\ldots,d}(X_1,\ldots,X_d)\right]\]

and the Sobol' indices are obtained by dividing by the variance and sum up to 1:

\[1=\sum_{i=1}^dS_i + \sum_{i,j=1\atop j\neq i}^d S_{i,j} + \sum_{i,j,k=1\atop i\neq j\neq k}^d S_{i,j,k} + \ldots + S_{1,\ldots,d}\]

A Sobol' index represents the share of output variance explained by an input variable or a group of input variables. For the input variable \(X_i\),

  • \(S_i\) is the first-order Sobol' index measuring the individual effect of \(X_i\),

  • \(S_{i,j}\) is the second-order Sobol' index measuring the joint effect between \(X_i\) and \(X_j\),

  • \(S_{i,j,k}\) is the third-order Sobol' index measuring the joint effect between \(X_i\), \(X_j\) and \(X_k\),

  • and so on.

In practice, we only consider the first-order Sobol' index:

\[S_i=\frac{V[E[Y|X_i]]}{V[Y]}\]

and the total-order Sobol' index:

\[S_i^T=\sum_{u\subset\{1,\ldots,d\}\atop u \ni i}S_u\]

The latter represents the sum of the individual effect of \(X_i\) and the joint effects between \(X_i\) and any input variable or group of input variable.

This methodology relies on the SobolAnalysis class. Precisely, SobolAnalysis.indices contains both SobolAnalysis.indices.first and SobolAnalysis.indices.total while SobolAnalysis.main_indices represents first-order Sobol' indices. Lastly, the SobolAnalysis.plot() method represents the estimations of both first-order and total-order Sobol' indices along with their confidence intervals whose default level is 95%.

The user can select the algorithm to estimate the Sobol' indices. The computation relies on OpenTURNS capabilities.

Control variates can be given to compute indices. In this case, the algorithm selection is disregarded and the estimation is based on the Monte Carlo estimator proposed by Saltelli in [SAA+10].

class SobolAnalysis(samples='')[source]#

Bases: BaseSensitivityAnalysis

Sensitivity analysis based on the Sobol' indices.

Examples

>>> from numpy import pi
>>> from gemseo import create_discipline, create_parameter_space
>>> from gemseo.uncertainty.sensitivity.sobol_analysis import SobolAnalysis
>>>
>>> expressions = {"y": "sin(x1)+7*sin(x2)**2+0.1*x3**4*sin(x1)"}
>>> discipline = create_discipline(
...     "AnalyticDiscipline", expressions=expressions
... )
>>>
>>> parameter_space = create_parameter_space()
>>> parameter_space.add_random_variable(
...     "x1", "OTUniformDistribution", minimum=-pi, maximum=pi
... )
>>> parameter_space.add_random_variable(
...     "x2", "OTUniformDistribution", minimum=-pi, maximum=pi
... )
>>> parameter_space.add_random_variable(
...     "x3", "OTUniformDistribution", minimum=-pi, maximum=pi
... )
>>>
>>> analysis = SobolAnalysis()
>>> analysis.compute_samples([discipline], parameter_space, n_samples=10000)
>>> indices = analysis.compute_indices()

Note

The second-order Sobol' indices cannot be estimated with control variates.

Parameters:

samples (IODataset | str | Path) --

The samples for the estimation of the sensitivity indices, either as an IODataset or as a pickle file path generated from the IODataset.to_pickle method. If empty, use compute_samples().

By default it is set to "".

class Algorithm(value, names=<not given>, *values, module=None, qualname=None, type=None, start=1, boundary=None)[source]#

Bases: PascalCaseStrEnum

The algorithms to estimate the Sobol' indices.

JANSEN = 'Jansen'#
MARTINEZ = 'Martinez'#
MAUNTZ_KUCHERENKO = 'MauntzKucherenko'#
SALTELLI = 'Saltelli'#
class ControlVariate(discipline, indices=<factory>, n_samples=0, variance=<factory>)[source]#

Bases: object

A control variate based on a cheap discipline.

If either indices or variance is missing, both are estimated from n_samples evaluations of discipline.

Parameters:
  • discipline (Discipline)

  • indices (Mapping[SobolAnalysis.Method, FirstOrderIndicesType]) --

    By default it is set to <factory>.

  • n_samples (int) --

    By default it is set to 0.

  • variance (Mapping[str, RealArray]) --

    By default it is set to <factory>.

discipline: Discipline#

A cheap discipline, e.g. a surrogate discipline.

It must have as inputs the input variables and the output variables used by SobolAnalysis.

indices: Mapping[SobolAnalysis.Method, FirstOrderIndicesType]#

The mapping between method names and first-order Sobol' indices.

If empty, SobolAnalysis will compute it.

n_samples: int = 0#

The number of samples to estimate the variance and the indices.

If 0, use 100 times more samples than the number passed at instantiation.

variance: Mapping[str, RealArray]#

The mapping between output names and output variances.

If empty, SobolAnalysis will compute it.

class Method(value, names=<not given>, *values, module=None, qualname=None, type=None, start=1, boundary=None)[source]#

Bases: StrEnum

The names of the sensitivity methods.

FIRST = 'first'#

The first-order Sobol' index.

TOTAL = 'total'#

The total-order Sobol' index.

class SensitivityIndices(first: 'FirstOrderIndicesType' = <factory>, second: 'SecondOrderIndicesType' = <factory>, total: 'FirstOrderIndicesType' = <factory>)[source]#

Bases: object

Parameters:
first: dict[str, list[dict[str, ndarray[Any, dtype[floating[Any]]]]]]#

The first-order Sobol' indices.

second: dict[str, list[dict[str, dict[str, ndarray[Any, dtype[floating[Any]]]]]]]#

The second-order Sobol' indices.

total: dict[str, list[dict[str, ndarray[Any, dtype[floating[Any]]]]]]#

The total order Sobol' indices.

compute_indices(output_names=(), algo=Algorithm.SALTELLI, confidence_level=0.95, control_variates=(), use_asymptotic_distributions=True, n_replicates=100, seed=0)[source]#

Compute the sensitivity indices.

Parameters:
  • output_names (str | Iterable[str]) --

    The name(s) of the output(s) for which to compute the sensitivity indices. If empty, use the names of the outputs set at instantiation.

    By default it is set to ().

  • algo (Algorithm) --

    The name of the algorithm to estimate the Sobol' indices.

    By default it is set to "Saltelli".

  • confidence_level (float) --

    The level of the confidence intervals.

    By default it is set to 0.95.

  • control_variates (ControlVariate | Iterable[ControlVariate]) --

    The control variates based on cheap disciplines.

    By default it is set to ().

  • use_asymptotic_distributions (bool) --

    Whether to estimate the confidence intervals of the first- and total-order Sobol' indices with the asymptotic distributions; otherwise, use bootstrap. If control variates are used, the confidence intervals can only be estimated via bootstrap.

    By default it is set to True.

  • n_replicates (int) --

    The number of bootstrap samples used for the computation of the confidence intervals.

    By default it is set to 100.

  • seed (int | None) --

    The seed to initialize the random generator used for the bootstrapping method when the indices are estimated using control variates. If None, then fresh, unpredictable entropy will be pulled from the OS.

    By default it is set to 0.

Returns:

The sensitivity indices.

Given a sensitivity method, an input variable and an output variable, the sensitivity index which is a 1D NumPy array can be accessed through indices.method_name[output_name][output_component][input_name].

Return type:

SensitivityIndices

compute_samples(disciplines, parameter_space, n_samples, output_names=(), algo='', algo_settings=mappingproxy({}), backup_settings=None, formulation_name='MDF', compute_second_order=True, **formulation_settings)[source]#

Compute the samples for the estimation of the sensitivity indices.

Parameters:
  • disciplines (Collection[Discipline]) -- The discipline or disciplines to use for the analysis.

  • parameter_space (ParameterSpace) -- A parameter space.

  • n_samples (int) -- A number of samples. If 0, the number of samples is computed by the algorithm.

  • output_names (Iterable[str]) --

    The disciplines' outputs to be considered for the analysis. If empty, use all the outputs.

    By default it is set to ().

  • algo (str) --

    The name of the DOE algorithm. If empty, use the BaseSensitivityAnalysis.DEFAULT_DRIVER.

    By default it is set to "".

  • algo_settings (Mapping[str, DriverSettingType]) --

    The settings of the DOE algorithm.

    By default it is set to {}.

  • backup_settings (BackupSettings | None) -- The settings of the backup file to store the evaluations if any.

  • formulation_name (str) --

    The name of the BaseMDOFormulation to sample the disciplines.

    By default it is set to "MDF".

  • compute_second_order (bool) --

    Whether to compute the second-order indices.

    By default it is set to True.

  • **formulation_settings (Any) -- The settings of the BaseMDOFormulation.

Returns:

The samples for the estimation of the sensitivity indices.

Return type:

IODataset

Notes

The estimators of Sobol' indices rely on the same DOE algorithm. This algorithm starts with two independent input datasets composed of \(N\) independent samples and this number \(N\) is the usual sampling size for Sobol' analysis. When compute_second_order=False or when the input dimension \(d\) is equal to 2, \(N=\frac{n_\text{samples}}{2+d}\). Otherwise, \(N=\frac{n_\text{samples}}{2+2d}\). The larger \(N\), the more accurate the estimators of Sobol' indices are. Therefore, for a small budget n_samples, the user can choose to set compute_second_order to False to ensure a better estimation of the first- and second-order indices.

get_intervals(first_order=True)[source]#

Get the confidence intervals for the Sobol' indices.

Warning

You must first call compute_indices().

Parameters:

first_order (bool) --

If True, compute the intervals for the first-order indices. Otherwise, for the total-order indices.

By default it is set to True.

Returns:

The confidence intervals for the Sobol' indices.

With the following structure:

{
    "output_name": [
        {
            "input_name": data_array,
        }
    ]
}

Return type:

dict[str, list[dict[str, ndarray[Any, dtype[floating[Any]]]]]]

plot(output, input_names=(), title='', save=True, show=False, file_path='', directory_path='', file_name='', file_format='', sort=True, sort_by_total=True)[source]#

Plot the first- and total-order Sobol' indices.

For the \(i\)-th input variable, plot its first-order Sobol' index \(S_i^{1}\) and its total-order Sobol' index \(S_i^{T}\) with dots and their confidence intervals with vertical lines.

The subtitle displays the standard deviation (StD) and the variance (Var) of the output of interest.

Parameters:
  • output (VariableType) -- The output for which to display sensitivity indices, either a name or a tuple of the form (name, component). If name, its first component is considered.

  • input_names (Iterable[str]) --

    The input variables for which to display the sensitivity indices. If empty, display all the input variables.

    By default it is set to ().

  • title (str) --

    The title of the plot. If empty, use a default one.

    By default it is set to "".

  • save (bool) --

    If True, save the figure.

    By default it is set to True.

  • show (bool) --

    If True, show the figure.

    By default it is set to False.

  • file_path (str | Path) --

    A file path. Either a complete file path, a directory name or a file name. If empty, use a default file name and a default directory. The file extension is inferred from filepath extension, if any.

    By default it is set to "".

  • directory_path (str | Path) --

    The path to the directory where to save the plots.

    By default it is set to "".

  • file_name (str) --

    The name of the file.

    By default it is set to "".

  • file_format (str) --

    A file format, e.g. 'png', 'pdf', 'svg', ... Used when file_path does not have any extension. If empty, use a default file extension.

    By default it is set to "".

  • sort (bool) --

    Whether to sort the input variables by decreasing order.

    By default it is set to True.

  • sort_by_total (bool) --

    Whether to sort according to the total-order Sobol' indices when sort is True. Otherwise, use the first-order Sobol' indices.

    By default it is set to True.

Returns:

The plot figure.

Return type:

Figure

unscale_indices(indices, use_variance=True)[source]#

Unscale the Sobol' indices.

Parameters:
Returns:

The unscaled Sobol' indices.

Return type:

dict[str, list[dict[str, ndarray[Any, dtype[floating[Any]]]]]] | dict[str, list[dict[str, dict[str, ndarray[Any, dtype[floating[Any]]]]]]]

DEFAULT_DRIVER: ClassVar[str] = 'OT_SOBOL_INDICES'#
property output_standard_deviations: dict[str, RealArray]#

The standard deviations of the output variables.

property output_variances: dict[str, RealArray]#

The variances of the output variables.