This document describes how to authenticate to Dataproc programmatically. How you authenticate to Dataproc depends on the interface you use to access the API and the environment where your code is running.
For more information about Google Cloud authentication, see the authentication overview.
API access
Dataproc supports programmatic access. You can access the API in the following ways:
Client libraries
The Dataproc client libraries provide high-level language support for authenticating to Dataproc programmatically. To authenticate calls to Google Cloud APIs, client libraries support Application Default Credentials (ADC); the libraries look for credentials in a set of defined locations and use those credentials to authenticate requests to the API. With ADC, you can make credentials available to your application in a variety of environments, such as local development or production, without needing to modify your application code.
Google Cloud CLI
When you use the gcloud CLI to access Dataproc, you log in to the gcloud CLI with a user account, which provides the credentials used by the gcloud CLI commands.
If your organization's security policies prevent user accounts from having the required permissions, you can use service account impersonation.
For more information, see Authenticate for using the gcloud CLI. For more information about using the gcloud CLI with Dataproc, see the gcloud CLI reference pages.
REST
You can authenticate to the Dataproc API by using your gcloud CLI credentials or by using Application Default Credentials. For more information about authentication for REST requests, see Authenticate for using REST. For information about the types of credentials, see gcloud CLI credentials and ADC credentials.
Set up authentication for Dataproc
How you set up authentication depends on the environment where your code is running.
The following options for setting up authentication are the most commonly used. For more options and information about authentication, see Authentication methods.
For a local development environment
You can set up credentials for a local development environment in the following ways:
- User credentials for client libraries or third-party tools
- User credentials for REST requests from the command line
Client libraries or third-party tools
Set up Application Default Credentials (ADC) in your local environment:
-
Install the Google Cloud CLI, then initialize it by running the following command:
gcloud init
-
If you're using a local shell, then create local authentication credentials for your user account:
gcloud auth application-default login
You don't need to do this if you're using Cloud Shell.
A sign-in screen appears. After you sign in, your credentials are stored in the local credential file used by ADC.
For more information about working with ADC in a local environment, see Local development environment.
REST requests from the command line
When you make a REST request from the command line,
you can use your gcloud CLI credentials by including
gcloud auth print-access-token
as part of the command that sends the request.
The following example lists service accounts for the specified project. You can use the same pattern for any REST request.
Before using any of the request data, make the following replacements:
- PROJECT_ID: Your Google Cloud project ID.
To send your request, expand one of these options:
For more information about authenticating using REST and gRPC, see Authenticate for using REST. For information about the difference between your local ADC credentials and your gcloud CLI credentials, see gcloud CLI authentication configuration and ADC configuration.
On Google Cloud
To authenticate a workload running on Google Cloud, you use the credentials of the service account attached to the compute resource where your code is running, such as a Compute Engine virtual machine (VM) instance. This approach is the preferred authentication method for code running on a Google Cloud compute resource.
For most services, you must attach the service account when you create the resource that will run your code; you cannot add or replace the service account later. Compute Engine is an exception—it lets you attach a service account to a VM instance at any time.
Use the gcloud CLI to create a service account and attach it to your resource:
-
Install the Google Cloud CLI, then initialize it by running the following command:
gcloud init
-
Set up authentication:
-
Create the service account:
gcloud iam service-accounts create SERVICE_ACCOUNT_NAME
Replace
SERVICE_ACCOUNT_NAME
with a name for the service account. -
To provide access to your project and your resources, grant a role to the service account:
gcloud projects add-iam-policy-binding PROJECT_ID --member="serviceAccount:SERVICE_ACCOUNT_NAME@PROJECT_ID.iam.gserviceaccount.com" --role=ROLE
Replace the following:
SERVICE_ACCOUNT_NAME
: the name of the service accountPROJECT_ID
: the project ID where you created the service accountROLE
: the role to grant
- To grant another role to the service account, run the command as you did in the previous step.
-
Grant the required role to the principal that will attach the service account to other resources.
gcloud iam service-accounts add-iam-policy-binding SERVICE_ACCOUNT_NAME@PROJECT_ID.iam.gserviceaccount.com --member="user:USER_EMAIL" --role=roles/iam.serviceAccountUser
Replace the following:
SERVICE_ACCOUNT_NAME
: the name of the service accountPROJECT_ID
: the project ID where you created the service accountUSER_EMAIL
: the email address for a Google Account
-
-
Create the resource that will run your code, and attach the service account to that resource. For example, if you use Compute Engine:
Create a Compute Engine instance. Configure the instance as follows:-
Replace
INSTANCE_NAME
with your preferred instance name. -
Set the
--zone
flag to the zone in which you want to create your instance. -
Set the
--service-account
flag to the email address for the service account that you created.
gcloud compute instances create INSTANCE_NAME --zone=ZONE --service-account=SERVICE_ACCOUNT_EMAIL
-
Replace
For more information about authenticating to Google APIs, see Authentication methods.
On-premises or on a different cloud provider
The preferred method to set up authentication from outside of Google Cloud is to use workload identity federation. For more information, see On-premises or another cloud provider in the authentication documentation.
What's next
- Understand Dataproc service accounts.
- Learn how to Create a cluster with a custom VM service account.
- Learn about Google Cloud authentication methods.
- See a list of authentication use cases.