Note
Go to the end to download the full example code.
Create a retry discipline#
Sometimes,
the execution of a discipline can fail and work after several repetitions.
The RetryDiscipline facilitates the management of these failures and repetitions.
This class illustrates this feature.
from __future__ import annotations
import sys
import time
from typing import TYPE_CHECKING
from numpy import array
from gemseo import LOGGER
from gemseo import create_discipline
from gemseo.core.discipline import Discipline
from gemseo.disciplines.wrappers.retry_discipline import RetryDiscipline
if TYPE_CHECKING:
from gemseo.typing import StrKeyMapping
Toy discipline#
For that example,
we create an AnalyticDiscipline to evaluate the expression \(y=1/x\):
analytic_disc = create_discipline("AnalyticDiscipline", expressions={"y": "1/x"})
This discipline will raise a ZeroDivisionError when \(x=0\).
Execution without failure#
Let's wrap this toy discipline in a RetryDiscipline
parametrized by a maximum number of 3 execution attempts:
retry_disc = RetryDiscipline(analytic_disc, n_trials=3)
We can execute this RetryDiscipline at \(x=2\):
retry_disc.execute({"x": array([2.0])})
retry_disc.io.data
{'x': array([2.]), 'y': array([0.5])}
and verify that the computation is correctly performed, \(y=0.5\), with only one execution attempt:
retry_disc.n_executions
1
Execution with failure#
If an exception like a ZeroDivisionError occurs,
we do not want to retry the execution and just do something else.
To do this,
we need to define the fatal exceptions for which the execution is not retried.
It means that if that error is raised,
then the discipline RetryDiscipline will stop execution
rather than retrying an attempt.
retry_disc = RetryDiscipline(
analytic_disc, n_trials=3, fatal_exceptions=[ZeroDivisionError]
)
try:
retry_disc.execute()
except ZeroDivisionError:
LOGGER.info("Manage this fatal exception.")
INFO - 16:23:09: Failed to execute discipline AnalyticDiscipline, aborting retry because of the exception type <class 'ZeroDivisionError'>.
INFO - 16:23:09: Manage this fatal exception.
We can verify the number of attempts is only \(1\):
retry_disc.n_executions
1
To highlight the use of n_trials parameter, let's try another toy discipline,
which will crash the first 2 executions and finally succeed at the third attempt.
class FictiveDiscipline(Discipline):
"""Discipline to be executed several times.
- The first 2 times, raise a RuntimeError,
- and finally succeed.
"""
def __init__(self) -> None:
super().__init__()
self.attempt = 0
def _run(self, input_data: StrKeyMapping) -> StrKeyMapping:
self.attempt += 1
LOGGER.info("attempt: %s", self.attempt)
if self.attempt < 3:
raise RuntimeError
return {}
We can then illustrate the use of n_trials parameter. Here we intentionally set
this value to 4, knowing the discipline will complete before at the third trial:
test_n_trials = FictiveDiscipline()
retry_disc = RetryDiscipline(test_n_trials, n_trials=4)
retry_disc.execute()
INFO - 16:23:09: attempt: 1
INFO - 16:23:09: attempt: 2
INFO - 16:23:09: attempt: 3
{}
and verify the calculation has been tried 3 times to succeed:
retry_disc.n_executions
3
Limit the execution time#
If you want to limit the duration of the wrapped discipline,
use the timeout option.
Here is an example of a discipline
whose execution does nothing except sleep for 5 seconds:
class DisciplineLongTimeRunning(Discipline):
"""A discipline that could run for a while, to test the timeout feature."""
def _run(self, input_data: StrKeyMapping) -> None:
time.sleep(5.0)
Now we wrap it in RetryDiscipline,
set the timeout argument to 2 seconds
and execute this new discipline:
retry_disc = RetryDiscipline(DisciplineLongTimeRunning(), n_trials=1, timeout=2.0)
sys.tracebacklimit = 0
try:
LOGGER.info("Running discipline...")
retry_disc.execute({})
LOGGER.info("Discipline completed without reaching the time limit.")
except TimeoutError:
LOGGER.info("Discipline stopped, due to a TimeoutError.")
INFO - 16:23:09: Running discipline...
ERROR - 16:23:14: Failed to execute discipline DisciplineLongTimeRunning after 1 attempt.
INFO - 16:23:14: Discipline stopped, due to a TimeoutError.
In the log, we can see the initial and final times of the discipline execution. We can also read that the timeout is reached.
In some cases,
this option could be very useful.
For example if you wrap an SSH discipline
(see gemseo-ssh plugin)
in RetryDiscipline.
In that context,
it can be important to limit the duration when an ssh connexion is too slow.
Note
The user can build his RetryDiscipline with a combination of all the
available parameters.
Some attributes of the discipline are public and can be modified after
instantiation (fatal_exceptions, n_trials, ...)
Note
In the previous example, we added sys.tracebacklimit = 0 to
limit message output by exception, just in order the
output is only focused on what we aim to demonstrate with that example.
Please don't put this statement in normal use, otherwise you could miss some
important messages in the output.
Total running time of the script: (0 minutes 5.013 seconds)