This page describes access control options in Cloud Data Fusion.
You can control access to resources in Cloud Data Fusion in the following ways:
To control access for control plane operations, such as creating and updating instances through the Google Cloud console, Google Cloud CLI, and REST API, use Identity and Access Management (IAM), as described on this page.
To grant access to Google Cloud data services, such as BigQuery or Cloud Storage to a service account where pipelines are running, use IAM.
To control granular permissions for actions performed in the instance, known as data plane operations, such as starting pipelines, use role-based access control (RBAC).
For information about the architecture and resources involved in Cloud Data Fusion access control, see Networking. For information about granting roles and permissions, see Manage access to projects, folders, and organizations.
About IAM in Cloud Data Fusion
You control access to Cloud Data Fusion features by granting IAM roles and permissions to service accounts and other principals in your Google Cloud project.
To grant fine-grained access to user accounts so that they can use the Cloud Data Fusion web interface, use RBAC.
By default, Cloud Data Fusion uses the following service accounts:
Cloud Data Fusion Service Account
The Cloud Data Fusion Service Account is a Google-managed service agent that can access customer resources at pipeline design time. This service agent is automatically added to a project when you enable the Cloud Data Fusion API. It's used for all instances in your project.
The service agent has the following responsibilities:
Communicating with other services, such as Cloud Storage, BigQuery, or Datastream during pipeline design.
Enabling execution by provisioning Dataproc clusters and submitting pipeline jobs.
Roles for the Cloud Data Fusion Service Account
By default, the Cloud Data Fusion service account has only the
Cloud Data Fusion API Service Agent role
(roles/datafusion.serviceAgent
).
The principal name for this service agent is
service-CUSTOMER_PROJECT_NUMBER@gcp-sa-datafusion.iam.gserviceaccount.com
.
The following default resources are associated with the Cloud Data Fusion API Service Agent role.
Role | Resource | Permissions |
---|---|---|
Cloud Data Fusion API Service Agent | Associated services:
|
See the Cloud Data Fusion API Service Agent permissions. |
Compute Engine default service account or custom service account
The Compute Engine service account is the default account that Cloud Data Fusion uses to deploy and run jobs that access other Google Cloud resources. By default, it attaches to a Dataproc cluster VM to let Cloud Data Fusion access Dataproc resources during a pipeline run.
You can choose a custom service account to attach to the Dataproc cluster when creating a Cloud Data Fusion instance or by creating new Compute Profiles in the Cloud Data Fusion web interface.
For more information, see Service accounts in Cloud Data Fusion.
Roles for the Compute Engine service account
By default, to access resources (such as sources and sinks) when you run a pipeline, Cloud Data Fusion uses the Compute Engine default service account.
You can set up a user-managed custom service account for Cloud Data Fusion instances and grant a role to this account. Afterwards, you can choose this service account when creating new instances.
Cloud Data Fusion Runner role
In the project containing the Cloud Data Fusion instance, for both default
and user-managed custom service accounts, grant the Cloud Data Fusion Runner
role (datafusion.runner
).
Role | Description | Permission |
---|---|---|
Data Fusion Runner (datafusion.runner) |
Lets the Compute Engine service account communicate with Cloud Data Fusion services in the tenant project | datafusion.instances.runtime |
Service Account User role
On the default or user-managed service account in the project where
Dataproc clusters are launched when you run pipelines, grant the
Cloud Data Fusion Service Account the Service Account User role
(roles/iam.serviceAccountUser
).
For more information, see Grant service account permission.
Dataproc Worker role
To run the jobs on Dataproc clusters, grant the Dataproc Worker
role (roles/dataproc.worker
) to the default or user-managed service
accounts used by your Cloud Data Fusion pipelines.
Roles for users
To trigger any operation in Cloud Data Fusion, you (the principal) must have enough permissions. Individual permissions are grouped into roles, and you grant roles to that principal.
If RBAC isn't enabled, or if you're using a Cloud Data Fusion edition that
doesn't support RBAC, users with any Cloud Data Fusion IAM
role have full access to the Cloud Data Fusion web
interface. The Admin role only allows users to manage the instance, such as
Create
, Update
, Upgrade
, and Delete
operations.
Grant the following roles to principals, depending on the permissions they need in Cloud Data Fusion.
Role | Description | Permissions |
---|---|---|
Cloud Data Fusion Admin (roles/datafusion.admin ) |
All viewer permissions, plus permissions to create, update, and delete Cloud Data Fusion instances. |
|
Cloud Data Fusion Viewer (roles/datafusion.viewer ) |
|
|
Access resources in another project at design time
This section describes access control on resources that are located in a different Google Cloud project than your Cloud Data Fusion instance at design time.
When you design pipelines in the Cloud Data Fusion web interface, you might use functions, such as Wrangler or Preview, which access resources in other projects.
The following sections describe how you determine the service account in your environment and then give the appropriate permissions.
Determine the service account of your environment
The service account name is Cloud Data Fusion Service Account and the principal
for this service agent is
service-CUSTOMER_PROJECT_NUMBER@gcp-sa-datafusion.iam.gserviceaccount.com
.
Give permission to access resources in another project
To grant the roles that give permission to access various resources, follow these steps:
- In the project where the target resource exists, add the
Cloud Data Fusion Service Account (
service-CUSTOMER_PROJECT_NUMBER@gcp-sa-datafusion.iam.gserviceaccount.com
) as a principal. - Grant roles to the Cloud Data Fusion Service Account on the target resource in the project where the target resource exists.
After you grant the roles, you can access resources in a different project at design time in the same way that you access resources in the project where your instance is located.
Access resources in another project at execution time
This section describes access control on resources that are located in a different Google Cloud project than your Cloud Data Fusion instance at execution time.
At execution time, you execute the pipeline on a Dataproc cluster, which may access resources in other projects. By default, the Dataproc cluster itself is launched in the same project as the Cloud Data Fusion instance, but you can use clusters in another project.
To access the resources in other Google Cloud projects, follow these steps:
- Determine the service account for your project.
- In the project where the resource is, grant IAM roles to the Compute Engine default service account to give it access to resources in another project.
Determine the Compute Engine service account
For more information about the Compute Engine service account, see About IAM in Cloud Data Fusion.
Grant IAM access resources in another project
The Compute Engine default service account requires permissions to access resources in another project. These roles and permissions can be different depending on the resource you want to access.
To access the resources, follow these steps:
- Grant roles and permissions, specifying your Compute Engine service account as a principal in the project where the target resource exists.
- Add appropriate roles to access the resource.
Cloud Data Fusion API permissions
The following permissions are required to execute the Cloud Data Fusion API.
API call | Permission |
---|---|
instances.create |
datafusion.instances.create |
instances.delete |
datafusion.instances.delete |
instances.list |
datafusion.instances.list |
instances.get |
datafusion.instances.get |
instances.update |
datafusion.instances.update |
operations.cancel |
datafusion.operations.cancel |
operations.list |
datafusion.operations.list |
operations.get |
datafusion.operations.get |
Permissions for common tasks
Common tasks in Cloud Data Fusion require the following permissions:
Task | Permissions |
---|---|
Accessing the Cloud Data Fusion web interface | datafusion.instances.get |
Accessing the Cloud Data Fusion Instances page in the Google Cloud console | datafusion.instances.list |
Accessing the Details page for an instance | datafusion.instances.get |
Creating a new instance | datafusion.instances.create |
Updating labels and advanced options to customize an instance | datafusion.instances.update |
Deleting an instance | datafusion.instances.delete |
What's next
- Learn more about access control between multiple projects.