Cloud Composer 1 | Cloud Composer 2 | Cloud Composer 3
This page describes how to install Python packages for your Cloud Composer environment.
About preinstalled and custom PyPI packages in Cloud Composer images
Cloud Composer images contains both preinstalled and custom PyPI packages.
Preinstalled PyPI packages are packages that are included in the Cloud Composer image of your environment. Each Cloud Composer image contains PyPI packages that are specific for your version of Cloud Composer and Airflow.
Custom PyPI packages are packages that you can install in your environment in addition to preinstalled packages.
Options to manage PyPI packages for Cloud Composer environments
Option | Use if |
---|---|
Install from PyPI | The default way to install packages in your environment |
Install from a repository with a public IP address | The package is hosted in a package repository other than PyPI. This repository has a public IP address |
Install from an Artifact Registry repository | The package is hosted in an Artifact Registry repository |
Install from a repository in your project's network | Your environment does not have access to public internet. The package is hosted in a package repository in your project's network. |
Install as a local Python library |
The package cannot be found in PyPI, and the library
does not have any external dependencies, such as dist-packages . |
Install a plugin | The package provides plugin-specific functionality, such as modifying the Airflow web interface. |
PythonVirtualenvOperator | You do not want the package to be installed for all Airflow workers, or the dependency conflicts with preinstalled packages. The package can be found in the PyPI and has no external dependencies. |
KubernetesPodOperator and GKE operators |
You require external dependencies that cannot be installed from pip ,
such as dist-packages , or are on an internal pip server. This
option requires more setup and maintenance. Consider it only if other
options do not work. |
Before you begin
- You must have a role that can trigger environment update operations. In addition, the service account of the environment must have a role that has enough permissions to perform update operations. For more information, see Access control.
- If your environment is protected by a VPC Service Controls perimeter, then before installing PyPI dependencies you must grant additional user identities with access to services that the service perimeter protects and enable support for a private PyPI repository.
- Requirements must follow the format specified in PEP-508 where each requirement is specified in lowercase and consists of the package name with optional extras and version specifiers.
PyPI dependency updates generate Docker images in Artifact Registry.
If a dependency conflict causes the update to fail, your environment continues running with its existing dependencies. If the operation succeeds, you can begin using the newly installed Python dependencies in your DAGs.
View the list of PyPI packages
You can get the list of packages for your environment in several formats.
View preinstalled packages
To view the list of preinstalled packages for your environment, see the list of packages for the Cloud Composer image of your environment.
View all packages
To view all packages (both preinstalled and custom) in your environment:
gcloud
The following gcloud CLI command returns the result of
the python -m pip list
command for an Airflow worker in your environment.
You can use the --tree
argument to get the result of the
python -m pipdeptree --warn
command.
gcloud beta composer environments list-packages \
ENVIRONMENT_NAME \
--location LOCATION
Replace:
ENVIRONMENT_NAME
with the name of the environment.LOCATION
with the region where the environment is located.
View custom PyPI packages
Console
In Google Cloud console, go to the Environments page.
In the list of environments, click the name of your environment. The Environment details page opens.
Go to the PyPI Packages tab.
gcloud
gcloud composer environments describe ENVIRONMENT_NAME \
--location LOCATION \
--format="value(config.softwareConfig.pypiPackages)"
Replace:
ENVIRONMENT_NAME
with the name of the environment.LOCATION
with the region where the environment is located.
Install custom packages in a Cloud Composer environment
This section describes different methods for installing custom packages in your environment.
Install packages from PyPI
A package can be installed from Python Package Index if it has no external dependencies or conflicts with preinstalled packages.
To add, update, or delete the Python dependencies for your environment:
Console
In Google Cloud console, go to the Environments page.
In the list of environments, click the name of your environment. The Environment details page opens.
Go to the PyPI packages tab.
Click Edit
Click Add package.
In the PyPI packages section, specify package names, with optional version specifiers and extras.
For example:
scikit-learn
scipy
,>=0.13.3
nltk
,[machine_learning]
Click Save.
gcloud
gcloud CLI has several agruments for working with custom PyPI packages:
--update-pypi-packages-from-file
replaces replaces all existing custom PyPI packages with the specified packages. Packages that you do not specify are removed.--update-pypi-package
updates or installs one package.--remove-pypi-packages
removes specified packages.--clear-pypi-packages
removes all packages.
Installing requirements from a file
The requirements.txt
file must have each
requirement specifier on a separate
line.
For example:
scipy>=0.13.3
scikit-learn
nltk[machine_learning]
Update your environment, and specify the requirements.txt
file in
the --update-pypi-packages-from-file
argument.
gcloud composer environments update ENVIRONMENT_NAME \
--location LOCATION \
--update-pypi-packages-from-file requirements.txt
Replace:
ENVIRONMENT_NAME
with the name of the environment.LOCATION
with the region where the environment is located.
Installing one package
Update your environment, and specify the package, version, and extras in
the --update-pypi-package
argument.
gcloud composer environments update ENVIRONMENT_NAME \
--location LOCATION \
--update-pypi-package PACKAGE_NAMEEXTRAS_AND_VERSION
Replace:
ENVIRONMENT_NAME
with the name of the environment.LOCATION
with the region where the environment is located.PACKAGE_NAME
with the name of the package.EXTRAS_AND_VERSION
with the optional version and extras specifier. To omit versions and extras, specify an empty value.
Example:
gcloud composer environments update example-environment \
--location us-central1 \
--update-pypi-package "scipy>=0.13.3"
Removing packages
Update your environment, and specify the packages that you want to delete in the --remove-pypi-packages
argument:
gcloud composer environments update ENVIRONMENT_NAME \
--location LOCATION \
--remove-pypi-packages PACKAGE_NAMES
Replace:
ENVIRONMENT_NAME
with the name of the environment.LOCATION
with the region where the environment is located.PACKAGE_NAMES
with a comma-separated list of packages.
Example:
gcloud composer environments update example-environment \
--location us-central1 \
--remove-pypi-packages scipy,scikit-learn
API
Construct an environments.patch
API request.
In this request:
In the
updateMask
parameter, specify the mask:- Use
config.softwareConfig.pypiPackages
mask to replace all existing packages with the specified packages. Packages that you do not specify are deleted. - Use
config.softwareConfig.envVariables.PACKAGE_NAME
to add or update a specific package. To add or update several packages, specify several masks with commas.
- Use
In the request body, specify packages and values for versions and extras:
{ "config": { "softwareConfig": { "pypiPackages": { "PACKAGE_NAME": "EXTRAS_AND_VERSION" } } } }
Replace:
PACKAGE_NAME
with the name of the package.EXTRAS_AND_VERSION
with the optional version and extras specifier. To omit versions and extras, specify an empty value.- To add more than one package, add extra entries for packages
to
pypiPackages
.
Example:
// PATCH https://composer.googleapis.com/v1/projects/example-project/
// locations/us-central1/environments/example-environment?updateMask=
// config.softwareConfig.pypiPackages.EXAMPLE_PACKAGE,
// config.softwareConfig.pypiPackages.ANOTHER_PACKAGE
{
"config": {
"softwareConfig": {
"pypiPackages": {
"EXAMPLE_PACKAGE": "",
"ANOTHER_PACKAGE": ">=1.10.3"
}
}
}
}
Terraform
The pypi_packages
block in the software_config
block specifies
packages.
resource "google_composer_environment" "example" {
name = "ENVIRONMENT_NAME"
region = "LOCATION"
config {
software_config {
pypi_packages = {
PACKAGE_NAME = "EXTRAS_AND_VERSION"
}
}
}
}
Replace:
ENVIRONMENT_NAME
with the name of the environment.LOCATION
with the region where the environment is located.PACKAGE_NAME
with the name of the package.EXTRAS_AND_VERSION
with the optional version and extras specifier. To omit versions and extras, specify an empty value.- To add more than one package, add extra entries for packages
to
pypi_packages
.
Example:
resource "google_composer_environment" "example" {
name = "example-environment"
region = "us-central1"
config {
software_config {
pypi_packages = {
scipy = ">=1.10.3"
scikit-learn = ""
nltk = "[machine_learning]"
}
}
}
}
Install packages from a public repository
You can install packages hosted in other repositories that have a public IP address.
The packages must be properly configured, so that the default pip
tool can install it.
To install from a package repository that has a public address:
Create a pip.conf file and include the following information in the file, if applicable:
- URL of the repository (in the
index-url
parameter) - Access credentials for the repository
- Non-default
pip
installation options
Example:
[global] index-url=https://example.com/
- URL of the repository (in the
(Optional) In some cases, you might want to fetch packages from multiple repositories, such as when the public repository contains some specific packages that you want to install, and you want to install all other packages from PyPI:
- Configure an Artifact Registry virtual repository.
- Add configuration for multiple repositories (including PyPI, if needed)
and define the order in which
pip
searches the repositories. - Specify the virtual repository's URL in the
index-url
parameter.
Upload the pip.conf file to the
/config/pip/
folder in your environment's bucket.Install packages using one of the available methods.
Install packages from an Artifact Registry repository
You can store packages in an Artifact Registry repository in your project, and configure your environment to install from it.
Configure roles and permissions:
The service account of your environment must have the
iam.serviceAccountUser
role.Make sure that the Cloud Build service account has permissions to read from your Artifact Registry repository.
If your environment has restricted access to other services in your project, for example, if you use VPC Service Controls:
Assign permissions to access your Artifact Registry repository to the environment's service account instead of the Cloud Build service account.
Make sure that connectivity to the Artifact Registry repository is configured in your project.
To install custom PyPI packages from an Artifact Registry repository:
Create a pip.conf file and include the following information in the file, if applicable:
- URL of the Artifact Registry repository (in the
index-url
parameter) - Access credentials for the repository
- Non-default
pip
installation options
For an Artifact Registry repository, append
/simple/
to the repository URL:[global] index-url = https://us-central1-python.pkg.dev/example-project/example-repository/simple/
- URL of the Artifact Registry repository (in the
(Optional) In some cases, you might want to fetch packages from multiple repositories, such as when your Artifact Registry repository contains some specific packages that you want to install, and you want to install all other packages from PyPI:
- Configure an Artifact Registry virtual repository.
- Add configuration for multiple repositories (including PyPI, if needed)
and define the order in which
pip
searches the repositories. - Specify the virtual repository's URL in the
index-url
parameter.
Upload this pip.conf file to the
/config/pip/
folder in your environment's bucket. For example:gs://us-central1-example-bucket/config/pip/pip.conf
.Install packages using one of the available methods.
Install packages from a private repository
You can host a private repository in your project's network and configure your environment to install Python packages from it.
Configure roles and permissions:
The service account for your Cloud Composer environment must have the
iam.serviceAccountUser
role.If you install custom PyPI packages from a repository in your project's network, and this repository does not have a public IP address:
Assign permissions to access this repository to the environment's service account.
Make sure that connectivity to this repository is configured in your project.
To install packages from a private repository hosted in your project's network:
Create a pip.conf file and include the following information in the file, if applicable:
- IP address of the repository in your project's network
- Access credentials for the repository
- Non-default
pip
installation options
Example:
[global] index-url=https://192.0.2.10/
(Optional) In some cases, you might want to fetch packages from multiple repositories, such as when the private repository contains some specific packages that you want to install, and you want to install all other packages from PyPI:
- Configure an Artifact Registry virtual repository.
- Add configuration for multiple repositories (including PyPI, if needed)
and define the order in which
pip
searches the repositories. - Specify the virtual repository's URL in the
index-url
parameter.
Upload the pip.conf file to the
/config/pip/
folder in your environment's bucket. For example:gs://us-central1-example-bucket/config/pip/pip.conf
.Install packages using one of the available methods.
Install a local Python library
To install an in-house or local Python library:
Place the dependencies within a subdirectory in the
dags/
folder in your environment's bucket. To import a module from a subdirectory, each subdirectory in the module's path must contain an__init__.py
package marker file.In the following example, the dependency is
coin_module.py
:dags/ use_local_deps.py # A DAG file. dependencies/ __init__.py coin_module.py
Import the dependency from the DAG definition file.
For example:
Use packages that depend on shared object libraries
Certain PyPI packages depend on system-level libraries. While Cloud Composer does not support system libraries, you can use the following options:
Use the KubernetesPodOperator. Set the Operator image to a custom build image. If you experience packages that fail during installation due to an unmet system dependency, use this option.
Upload the shared object libraries to your environment's bucket. If your PyPI packages have installed successfully but fail at runtime, use this option.
- Manually find the shared object libraries for the PyPI dependency (an .so file).
- Upload the shared object libraries to the
/plugins
folder in your environment's bucket. - Set the following environment variable:
LD_LIBRARY_PATH=$LD_LIBRARY_PATH:/home/airflow/gcs/plugins
Install packages in private IP environments
This section explains how to install packages in private IP environments.
Depending on how you configure your project, your environment might not have access to the public internet.
Private IP environment with public internet access
If your private IP environment can access public internet, then you can install packages using options for public IP environments:
- Install from PyPI. In this case, no special configuration is required. Follow the procedure described in Install a package from PyPI.
- Install from a repository with a public IP address. Follow the procedure described in Install a package from a private repository.
- Install from a private PyPI repository hosted in your project's network.
Private IP environment without internet access
If your private IP environment does not have access to public internet, then you can install packages using one of the following ways:
- Use a private PyPI repository hosted in your project's network.
- Use a proxy server VM in your project's network to connect
to a PyPI repository on the public internet. Specify the proxy address in
the
/config/pip/pip.conf
file in your environment's bucket. - Use an Artifact Registry repository as the only source
of packages. To do so, redefine the
index-url
parameter, as described. - If your security policy permits access to external IP addresses from your VPC network, you can enable the installation of packages from repositories on the public internet by configuring Cloud NAT.
- Put Python dependencies into the
/dags
folder in your environment's bucket to install them as local libraries. This might not be a good option if the dependency tree is large.
Install to a private IP environment under resource location restrictions
Keeping your project in line with Resource Location Restriction requirements prohibits the use of some tools. In particular, Cloud Build cannot be used for package installation, preventing direct access to repositories on the public internet.
To install Python dependencies in such an environment, follow the guidance for a private IP environments without internet access.
Install a Python dependency to a private IP environment in a VPC Service Controls perimeter
Protecting your project with a VPC Service Controls perimeter results in further security restrictions. In particular, Cloud Build cannot be used for package installation, preventing direct access to repositories on the public internet.
To install Python dependencies for a private IP environment inside a perimeter, follow the guidance for private IP environments without internet access.