Cloud Composer 3 | Cloud Composer 2 | Cloud Composer 1
This page explains how to manually clean up the Airflow database in your environment.
If you want to clean up the database automatically instead, see Configure database retention policy.
Automatic database cleanup
Cloud Composer provides several alternatives to doing manual cleanups:
You can configure the database retention policy for your environment, so that records older than a certain period are automatically removed from the Airflow database daily.
Before the database retention policy became available in Cloud Composer, we recommended a different approach for automating the database cleanup, through a database cleanup DAG. This approach is obsolete, but is still supported. You can find the description and the DAG code on the Cloud Composer 2 page.
Limits for database size
As the time goes, the Airflow database of your environment stores more and more data. This data includes information and logs related to past DAG runs, tasks, and other Airflow operations.
If the Airflow database size is more than 20 GB, then you can't upgrade your environment to a later version.
If the Airflow database size is more than 20 GB, it is not possible to create snapshots.
Run the database cleanup operation through Airflow CLI
When you run the airflow db trim
Airflow CLI command through
Google Cloud CLI, Cloud Composer performs a
database retention operation.
During this operation, Cloud Composer removes Airflow database entries older than the currently configured database retention period (default is 60 days). This operation doesn't lock Airflow database tables, and maintains data consistency even if it is interrupted.
To remove old entries from the Airflow database, run the following command:
gcloud composer environments run ENVIRONMENT_NAME \
--location LOCATION \
db trim \
-- --retention-days RETENTION_DAYS
Replace the following:
ENVIRONMENT_NAME
: the name of your environment.LOCATION
: the region where the environment is located.RETENTION_DAYS
: the retention period, in days. Entries older than this number of days are removed.
For more information about running Airflow CLI commands in Cloud Composer, see Access Airflow command-line interface.
Maintain database performance
Airflow database performance issues can lead to overall DAG execution issues. Observe Database CPU and memory usage statistics. If CPU and memory utilization approaches the limits, then the database is overloaded and requires scaling. The amount of resources available to the Airflow database is controlled by the environment size property of your environment. To scale the database up change the environment size to a larger tier. Increasing the environment size increases the costs of your environment.
If you use the XCom mechanism to transfer files, make sure that you use it according to Airflow's guidelines. Transferring big files or a large number of files using XCom impacts Airflow database's performance and can lead to failures when loading snapshots or upgrading your environment. Consider using alternatives such as Cloud Storage to transfer large volumes of data.
Remove entries for unused DAGs
You can remove database entries for unused DAGs by removing DAGs from the Airflow UI.
What's next