Use Eventarc to manage Dataflow jobs

This document describes how to create event-driven workflows triggered by state changes in your Dataflow jobs.

For example, your workflow might:

  • Send an alert to an on-call engineer if a critical job fails.
  • Notify users when a batch job completes, or start another Dataflow job.
  • Clean up resources used by a job, such as Cloud Storage buckets.

Overview

Eventarc is a Google Cloud service that can listen to events from other services and route them to various destinations.

When you run a Dataflow job, the job transitions through various states, such as JOB_STATE_QUEUED, JOB_STATE_RUNNING, and JOB_STATE_DONE. Dataflow integration with Eventarc lets you trigger an action when a job changes state.

Because Eventarc is a managed service, you don't need to provision or manage the underlying infrastructure.

Before you begin

  1. Sign in to your Google Cloud account. If you're new to Google Cloud, create an account to evaluate how our products perform in real-world scenarios. New customers also get $300 in free credits to run, test, and deploy workloads.
  2. In the Google Cloud console, on the project selector page, select or create a Google Cloud project.

    Go to project selector

  3. Make sure that billing is enabled for your Google Cloud project.

  4. Enable the Eventarc APIs.

    Enable the APIs

  5. In the Google Cloud console, on the project selector page, select or create a Google Cloud project.

    Go to project selector

  6. Make sure that billing is enabled for your Google Cloud project.

  7. Enable the Eventarc APIs.

    Enable the APIs

To use the Eventarc API, your project must have enough quota. Also, the service account associated with the Eventarc trigger must have the appropriate permissions.

Choose an event destination

Choose an event destination to receive the event. The destination determines the next step in your workflow.

For example:

  • To send an SMS alert, you might use Cloud Run functions to create a standalone HTTP trigger.
  • For a more complex workflow, you might use Workflows.
  • If your Dataflow pipeline is part of a larger solution that runs on Google Kubernetes Engine, the trigger can route the event to a GKE service running in your cluster.

For more information about this style of architecture, see Event-driven architectures in the Eventarc documentation.

Create a trigger

To create an Eventarc trigger for Dataflow job state changes, refer to one of the following documents:

Optionally, you can filter events by Dataflow job ID. For example, you can select job IDs that match a regular expression. For more information, see Understand path patterns.

Process events

The event data describes the Dataflow job at the time the event was triggered. The payload is similar to the Job resource type, with the steps, pipeline_description, and transform_name_mapping fields omitted. Also, depending on the job state, some fields might not be present.

The following shows an example payload:

{
  "id":"2023-04-13_16_28_37-12345678",
  "projectId":"my-project",
  "name":"job1",
  "currentState":"JOB_STATE_QUEUED",
  "currentStateTime":"2023-04-13T23:28:37.437622Z",
  "createTime":"2023-04-13T23:28:37.437622Z",
  "location":"us-central1",
  "startTime":"2023-04-13T23:28:37.437622Z"
}

For more information about job states, see the following topics:

What's next