Cloud Composer 1 | Cloud Composer 2 | Cloud Composer 3
Running a business-critical application on Cloud Composer requires multiple parties to carry different responsibilities. While not an exhaustive list, this document lists the responsibilities for both Google and the Customer sides.
Google Responsibilities
Hardening and patching the Cloud Composer environment's components and underlying infrastructure, including Google Kubernetes Engine cluster, Cloud SQL database (that hosts the Airflow database), Pub/Sub, Artifact Registry and other environment elements. In particular, this includes auto-upgrading the underlying infrastructure, including the GKE cluster and Cloud SQL instance of an environment.
Protecting access to Cloud Composer environments through incorporating access control provided by IAM, encrypting data at rest by default, providing additional customer-managed storage encryption, encrypting data in transit.
Providing Google Cloud integrations for Identity and Access Management, Cloud Audit Logs and Cloud Key Management Service.
Restricting and logging Google administrative access to customers' clusters for contractual support purposes with Access Transparency and Access Approval.
Publishing information about backward incompatible changes between Cloud Composer and Airflow versions in Cloud Composer Release Notes.
Keeping Cloud Composer documentation up to date:
Providing description of all functionalities provided by Cloud Composer.
Providing troubleshooting instructions that help to keep environments in a healthy state.
Publishing information about known issues with workarounds (if they exist).
Resolving critical security incidents related to Cloud Composer environments and Airflow images provided by Cloud Composer (excluding customer-installed Python packages) by delivering new environment versions addressing the incidents.
Depending on customer's Support Plan, troubleshooting of Cloud Composer environment health issues.
Maintaining and expanding the functionality of the Cloud Composer Terraform provider.
Cooperating with the Apache Airflow community to maintain and develop Google Airflow operators.
Troubleshooting and, if possible, fixing issues in Airflow core functionalities.
Customer responsibilities
Upgrading to new Cloud Composer and Airflow versions to keep support for the product and to resolve security issues once Cloud Composer service publishes a Cloud Composer version that addresses the issues.
Maintaining the DAGs code to keep it compatible with the used Airflow version.
Keeping the environment's GKE cluster configuration intact, particularly including its auto-upgrade feature.
Maintaining proper permissions in IAM for the environment's service account. Particularly, keeping permissions required by the Cloud Composer Agent and the environment's service account. Maintaining required permission for the CMEK key used for Cloud Composer environment encryption and rotating it according to your needs.
Maintaining proper permissions in IAM for the environment's bucket and Artifact Registry repository where Composer's component images are stored.
Maintaining proper end user permissions in IAM and Airflow UI Access Control configuration.
Keeping Airflow database size below 16 GB through using the maintenance DAG.
Resolving all DAG parsing issues before raising support cases to Cloud Customer Care.
Adjusting Cloud Composer environment parameters (such as CPU and memory for Airflow components) and Airflow configurations to meet performance and load expectations of Cloud Composer environments using Cloud Composer optimization guide and environment scaling guide.
Avoiding removing permissions required by Cloud Composer Agent and environment's service accounts (removing these permissions can lead either to failed management operations or to DAG and task failures).
Keeping all services and APIs required by Cloud Composer always enabled. These dependencies must have quotas configured at levels required for Cloud Composer.
Keeping Artifact Registry repositories that host container images used by Cloud Composer environments.
Following recommendations and best practices for implementing DAGs.
Diagnosing DAG and task failures using instructions for scheduler troubleshooting, DAG troubleshooting and triggerer troubleshooting.
Avoiding installing or running additional components in the environment's GKE cluster that interfere with Cloud Composer components and prevent them from functioning correctly.