Set up highly resilient environments

Cloud Composer 1 | Cloud Composer 2 | Cloud Composer 3

This page describes how to set up highly resilient Cloud Composer environments.

About resiliency for zonal failures in Cloud Composer

Highly resilient Cloud Composer environments use built-in redundancy and failover mechanisms that reduce the environment's susceptibility to zonal failures and single point of failure outages.

For example, a zonal outage interrupts Airflow tasks that run in a specific zone. Afterwards, a highly resilient environment recovers, restarts its affected components in a different zone, and switches its database to a secondary zone. Thus, the failed Airflow tasks can be rescheduled and restarted by Airflow, while at the same time preserving the history of DAG runs and other settings.

A highly resilient environment runs across at least two zones of a selected region. Cloud Composer automatically distributes the components of your environment between zones.

You can use highly resilient Cloud Composer environments for critical business processes.

About highly available database of your environment

In highly available Cloud Composer environments, the Cloud SQL instance that stores the database of your environment runs in the high availability mode. A Cloud SQL instance configured for high availability is also called a regional instance and is located in a primary and secondary zone within the configured region. Within a regional instance, the configuration is made up of a primary instance and a standby instance.

In case of an outage, the Cloud SQL instance of your environment performs the automatic database failover to the standby Cloud SQL instance. You do not need to perform any additional actions in your Cloud Composer environment. Once the primary zone is operational again, the environment switches back to having two zones (primary and secondary). Primary and secondary zones can be swapped in some cases. The Cloud SQL instance in high availability mode uses the same IP address after a failover.

About highly available Airflow components

Highly available Cloud Composer environments run Airflow components that are distributed between zones.

Your environment always runs exactly two Airflow schedulers, two web servers, and at least two (but no more than ten) triggerers if triggerers are enabled. These pairs of components run in separate zones. The minimum number of workers is set to two, and your environment's cluster distributes worker instances between zones. In case of a zonal outage, affected worker instances are rescheduled in a different zone.

For more information about the architecture of highly resilient environments, see Highly resilient environment architecture.

Before you begin

  • Highly resilient environments are available only in Private IP environments.

  • Highly resilient environments are offered at an incremental charge when compared to regular environments.

  • Highly resilient environments are available in Cloud Composer version 2.2.0 and later versions.

  • If you want to update a standard environment to a highly resilient one, make sure that it meets the following configuration requirements. If your environment doesn't meet these requirements, you can update its scale and performance parameters.

    • The minimum number of Airflow workers is 2 or more.
    • The number of Airflow schedulers is exactly 2.
    • If you use deferrable operators in your DAGs, then at least 2 triggerers.

Create a highly resilient environment

To create a highly resilient environment, enable the high resilience mode when you create an environment.

Update a standard environment to high resilience mode

Console

  1. In Google Cloud console, go to the Environments page.

    Go to Environments

  2. In the list of environments, click the name of your environment. The Environment details page opens.

  3. Select the Environment configuration tab.

  4. In the Resilience mode section, click Edit.

  5. Select High resilience and click Save.

gcloud

  gcloud composer environments update ENVIRONMENT_NAME \
    --location LOCATION \
    --enable-high-resilience

Replace the following:

  • ENVIRONMENT_NAME: the name of your environment.
  • LOCATION: the region where the environment is located.

API

  1. Construct an environments.patch API request.

  2. In this request:

    1. In the updateMask parameter, specify the config.resilienceMode mask.

    2. In the request body, specify, HIGH_RESILIENCE to switch to the high resilience mode.

Example:

// PATCH https://composer.googleapis.com/v1/projects/example-project/
// locations/us-central1/environments/example-environment?updateMask=
// config.resilienceMode

{
 "config": {
   "resilience_mode": { "HIGH_RESILIENCE" }
   }
 }

Terraform

The resilience_mode field in the config block specifies the resilience mode. To use the high resilience mode, set this value to HIGH_RESILIENCE.

resource "google_composer_environment" "example" {
  provider = google-beta
  name = "ENVIRONMENT_NAME"
  region = "LOCATION"

  config {

    resilience_mode = "HIGH_RESILIENCE"

  }
}

Replace the following:

  • ENVIRONMENT_NAME: the name of your environment.
  • LOCATION: the region where the environment is located.

Example:

resource "google_composer_environment" "example" {
  provider = google-beta
  name = "example-environment"
  region = "us-central1"

  config {

    resilience_mode = "HIGH_RESILIENCE"

}

Change a highly resilient environment to standard resilience mode

You can change your environment to standard resilience mode at any time. This operation:

  • Reduces the number of web servers in your environment to 1.
  • Switches off the high availability mode of your environment's Airflow database.
  • Doesn't change the settings for minimum number of Airflow workers, schedulers, or triggerers.

Console

  1. In Google Cloud console, go to the Environments page.

    Go to Environments

  2. In the list of environments, click the name of your environment. The Environment details page opens.

  3. Select the Environment configuration tab.

  4. In the Resilience mode section, click Edit.

  5. Select Standard resilience (default) and click Save.

gcloud

  gcloud composer environments update ENVIRONMENT_NAME \
    --location LOCATION \
    --disable-high-resilience

Replace the following:

  • ENVIRONMENT_NAME: the name of your Cloud Composer environment
  • LOCATION: the region where the environment is located.

API

  1. Construct an environments.patch API request.

  2. In this request:

    1. In the updateMask parameter, specify the config.resilienceMode mask.

    2. In the request body, specify, RESILIENCE_MODE_UNSPECIFIED to switch to the standard resilience mode.

Example:

// PATCH https://composer.googleapis.com/v1/projects/example-project/
// locations/us-central1/environments/example-environment?updateMask=
// config.resilienceMode

{
 "config": {
   "resilience_mode": { "RESILIENCE_MODE_UNSPECIFIED" }
   }
 }

Terraform

The resilience_mode field in the config block specifies the resilience mode. To use the standard resilience mode, set this value to STANDARD_RESILIENCE.

resource "google_composer_environment" "example" {
  provider = google-beta
  name = "ENVIRONMENT_NAME"
  region = "LOCATION"

  config {

    resilience_mode = "STANDARD_RESILIENCE"

  }
}

Replace the following:

  • ENVIRONMENT_NAME: the name of your environment.
  • LOCATION: the region where the environment is located.

Example:

resource "google_composer_environment" "example" {
  provider = google-beta
  name = "example-environment"
  region = "us-central1"

  config {

    resilience_mode = "STANDARD_RESILIENCE"

}

Check if your environment runs in the high resilience mode

Console

  1. In Google Cloud console, go to the Environments page.

    Go to Environments

  2. In the list of environments, click the name of your environment. The Environment details page opens.

  3. Select the Environment configuration tab.

  4. In the Resilience mode section, view the resilience mode of your environment.

gcloud

To check if the high resilience mode is enabled in your environment, run the following Google Cloud CLI command. The value of True means that high resilience mode is enabled in your environment.

gcloud composer environments describe ENVIRONMENT_NAME \
  --location LOCATION \
  --format="value(config.resilienceMode)"

Replace the following:

  • ENVIRONMENT_NAME: the name of your Cloud Composer environment
  • LOCATION: the region where the environment is located.

What's next