Configure retries for a pipeline task

You can specify whether a pipeline task must be rerun if it fails, by configuring the retries for that task. You can set the number of attempts to rerun the task on failure and the delay between subsequent retries.

Use the following code sample to configure the failure policy of a pipeline task named train_op by using the set_retry method in the Kubeflow Pipelines SDK:

from kfp import dsl

@dsl.pipeline(name='custom-container-pipeline')
def pipeline():
  generate = generate_op()
  train = (
    train_op(
      training_data=generate.outputs['training_data'],
      test_data=generate.outputs['test_data'],
      config_file=generate.outputs['config_file'])
    .set_retry(
      num_retries=NUMBER_OF_RETRIES,
      backoff_duration='BACKOFF_DURATION',
      backoff_factor=BACKOFF_FACTOR,
      backoff_maxk_duration='BACKOFF_MAX_DURATION'
    )

Replace the following:

  • NUMBER_OF_RETRIES: The number of times to retry the task upon failure.

  • BACKOFF_DURATION: Optional. The duration of time wait after the task fails before retrying. If you don't set this parameter, the duration is set to 0s, by default.

  • BACKOFF_FACTOR: Optional. The factor by which the backoff duration is multiplied for each subsequent retry. If you don't set this parameter, the backoff factor is set to 2.0, by default.

  • BACKOFF_MAX_DURATION: Optional. The maximum backoff duration between subsequent retries. If you don't set this parameter, the maximum duration is set to 3600s, by default.