Use cross-bucket replication

This page describes how to use cross-bucket replication, which uses Storage Transfer Service to copy new and updated objects asynchronously from a source bucket to a destination bucket. When you use cross-bucket replication, you create and manage replication jobs, which are a type of job in Storage Transfer Service.

Before you begin

Before you begin, complete the following steps.

Enable the Storage Transfer Service API

If you haven't already, enable the Storage Transfer Service API.

Get required roles

To get the permissions that you need to use cross-bucket replication, ask your administrator to grant you the Storage Transfer User (roles/storagetransfer.user) IAM role on the bucket or the project.

This predefined role contains the permissions required to use cross-bucket replication. To see the exact permissions that are required, expand the Required permissions section:

Required permissions

The following permissions are required to use cross-bucket replication:

  • storagetransfer.jobs.create
  • storagetransfer.jobs.delete
  • storagetransfer.jobs.get
  • storagetransfer.jobs.list
  • storagetransfer.jobs.run
  • storagetransfer.jobs.update

For instructions on granting roles on buckets, see Use IAM with buckets. For instructions on granting roles on projects, see Manage access to projects.

Grant required roles

Cross-bucket replication uses Pub/Sub to receive notifications of changes to your source bucket and Storage Transfer Service to replicate objects from your source bucket to your destination bucket. To use cross-bucket replication, you must also grant the required permissions to the service agent that's used by Storage Transfer Service to replicate data and the service agent that's used by Pub/Sub to write notifications.

Grant required roles to Storage Transfer Service service agent

Storage Transfer Service uses a Google-managed service agent to replicate data. The email address of this service agent follows the naming format project-PROJECT_NUMBER@storage-transfer-service.iam.gserviceaccount.com. You can get the email address of the Storage Transfer Service service agent by using the Storage Transfer Service googleServiceAccounts.get API.

The Storage Transfer Service service agent needs the following permissions to replicate your objects and set up Pub/Sub notifications for your source bucket:

Required permissions

  • storage.buckets.get on the source and destination bucket
  • storage.buckets.update on the source bucket
  • storage.objects.list on the source bucket
  • storage.objects.get on the source bucket
  • storage.objects.rewrite on the destination bucket
  • pubsub.topics.create on the project

These permissions can be granted through the Pub/Sub Editor (roles/pubsub.editor) role and the Storage Admin (roles/storage.admin) role. For a less permissive role than the Storage Admin role, you can also use a custom role.

Grant required roles to Cloud Storage service agent

Cloud Storage uses a Google-managed service agent to manage Pub/Sub notifications. The email address of this service agent follows the naming format service-PROJECT_NUMBER@gs-project-accounts.iam.gserviceaccount.com.

The Cloud Storage service agent needs the following permissions to set up Pub/Sub and publish messages to a topic:

Required permissions

  • pubsub.topics.publish on the Pub/Sub topic
  • pubsub.subscriptions.consume on the Pub/Sub topic
  • pubsub.subscriptions.create on the project

These permission can be granted through the Pub/Sub Publisher (roles/pubsub.publisher) role.

Create a replication job

Console

When using the Google Cloud console, you can create a replication job for existing buckets or for new buckets during the bucket creation process.

To create a replication job for a new bucket, follow the instructions for creating a new bucket.

To create a replication job for an existing bucket, complete the following steps:

  1. In the Google Cloud console, go to the Cloud Storage Buckets page.

    Go to Buckets

  2. In the list of buckets, click the name of the source bucket whose objects you want to replicate.

  3. On the Bucket details page, click the Configuration tab.

  4. Locate the Cross-bucket replication option and click Edit.

  5. In the Edit cross-bucket replication pane that opens, click Add a destination.

  6. In the Choose a destination section, select a destination bucket, then click Next.

  7. In the Choose replication settings section, do the following:

    • Optional: To filter objects to replicate by object name prefix, select the Replicate objects based on prefix checkbox in the Choose which objects to replicate section.

      • To include objects by prefix, enter a prefix in the Include objects with prefix section, then click Add a prefix.

      • To exclude objects by prefix, enter a prefix in the Exclude objects with prefix section, then click Add a prefix.

    • Optional: To set a storage class for replicated objects, select a storage class from the menu in the Set storage class for replicated objects section.

      If you skip this step, replicated objects use the destination bucket's storage class by default.

    1. Click Save.

Command line

When using the Google Cloud CLI, you can create a replication job for existing buckets.

To create a replication job, use the gcloud alpha transfer jobs create command with the --replication flag:

gcloud alpha transfer jobs create gs://SOURCE_BUCKET_NAME gs://DESTINATION_BUCKET_NAME --replication

Replace:

  • SOURCE_BUCKET_NAME with the name of the source bucket you want to replicate. For example, my-source-bucket.

  • DESTINATION_BUCKET_NAME with the name of the destination bucket. For example, my-destination-bucket.

REST APIs

JSON API

When using the JSON API, you can create a replication job for existing buckets.

  1. Have gcloud CLI installed and initialized, in order to generate an access token for the Authorization header.

    Alternatively, you can create an access token using the OAuth 2.0 Playground and include it in the Authorization header.

  2. Create a JSON file that contains a TransferJob object with an initialized ReplicationSpec resource:

    TransferJob {
     "name": "TRANSFER_JOB_NAME",
     ...
     ReplicationSpec: {
       "gcsDataSource": {
         "bucketName": "SOURCE_BUCKET_NAME"
       },
       "gcsDataSink" {
         "bucketName": "DESTINATION_BUCKET_NAME"
       },
       "objectConditions": {
       },
       "transferOptions": {
         "overwriteWhen": "OVERWRITE_OPTION"
       }
     }
     ...
    }

    Replace:

    • TRANSFER_JOB_NAME with the name you want to assign the replication job. See the transferJobs reference documentation for naming requirements.

    • SOURCE_BUCKET_NAME with the name of the source bucket that contains the objects you want to replicate. For example, example-source-bucket.

    • DESTINATION_BUCKET_NAME with the name of the destination bucket where your objects will be replicated. For example, example-destination-bucket.

    • OVERWRITE_OPTION with an option for how existing objects in the destination bucket can be overwritten as the result of a replication job, which can happen when the destination object and the source object have the same name. The value must be one of the following:

      • ALWAYS: Always overwrite objects in the destination bucket

      • DIFFERENT: Only overwrite objects in the destination bucket if the destination object data is different from the source object data

      • NEVER: Never overwrite objects in the destination bucket

  3. Use cURL to call the Storage Transfer Service REST API with a transferJobs.create request:

    curl -X POST --data-binary @JSON_FILE_NAME \
    -H "Authorization: Bearer $(gcloud auth print-access-token)" \
    -H "Content-Type: application/json" \
    "https://storagetransfer.googleapis.com/v1/transferJobs"

    Where:

    • JSON_FILE_NAME is the name of the JSON file you created in Step 2.

To check the status of the replication job, view Cloud Logging for Storage Transfer Service logs.

List replication jobs

Console

You cannot list replication jobs using the Google Cloud console. See View replication job for instructions on how to view a single replication job at a time.

Command line

Use the gcloud alpha transfer jobs list command with the --job-type flag:

gcloud alpha transfer jobs list --job-type=replication

REST APIs

JSON API

  1. Have gcloud CLI installed and initialized, in order to generate an access token for the Authorization header.

    Alternatively, you can create an access token using the OAuth 2.0 Playground and include it in the Authorization header.

  2. Use cURL to call the Storage Transfer Service REST API with a transferJobs.list request:

    curl -X GET \
    -H "Authorization: Bearer $(gcloud auth print-access-token)" \
    -H "Content-Type: application/json" \
    "https://storagetransfer.googleapis.com/v1/transferJobs"

View a replication job

Console

  1. In the Google Cloud console, go to the Cloud Storage Buckets page.

    Go to Buckets

  2. In the list of buckets, click the name of the source bucket whose cross-bucket replication job you want to view.

  3. On the Bucket details page, click the Configuration tab.

  4. Locate the Cross-bucket replication option and click Edit.

    The Edit cross-bucket replication pane appears, which displays the replication job for each destination bucket.

On the Buckets page, you can view the Replication column, which displays whether a bucket has a Turbo replication job or a cross-bucket replication job running. For instructions on displaying the Replication column, see Show columns.

Command line

Use the gcloud alpha transfer jobs describe command:

gcloud alpha transfer jobs describe JOB_NAME

Replace:

  • JOB_NAME with the unique ID of the replication job. For example, 1234567890. To find the ID of your transfer job, list your replication jobs.

REST APIs

JSON API

  1. Have gcloud CLI installed and initialized, in order to generate an access token for the Authorization header.

    Alternatively, you can create an access token using the OAuth 2.0 Playground and include it in the Authorization header.

  2. Use cURL to call the Storage Transfer Service REST API with a transferJobs.get request:

    curl -X GET \
    -H "Authorization: Bearer $(gcloud auth print-access-token)" \
    -H "Content-Type: application/json" \
    "https://storagetransfer.googleapis.com/v1/transferJobs/JOB_NAME"

    Replace:

    • JOB_NAME with the unique ID of the replication job. For example, 1234567890. To find the ID of your replication job, list your replication jobs.

Update a replication job

You can update the following fields of a replication job:

  • The description of the replication job

  • The configuration for running a replication job

  • The configuration of notifications published to Pub/Sub

  • The logging behavior for replication job operations

  • The status of the replication job (whether it's enabled, disabled, or deleted)

Console

When using the Google Cloud console, you can only update a replication job by pausing or unpausing the job.

  1. In the Google Cloud console, go to the Cloud Storage Buckets page.

    Go to Buckets

  2. In the list of buckets, click the name of the source bucket that you want to pause or unpause replicating.

  3. On the Bucket details page, click the Configuration tab.

  4. Locate the Cross-bucket replication option and click Edit.

  5. In the Edit cross-bucket replication pane that appears, click Pause or Unpause next to the replication job you want to update.

Command line

Use the gcloud alpha transfer jobs update command with the flags that control the replication job properties you want to update. For a list of possible flags, view the gcloud alpha transfer jobs update documentation.

For example, to update the object overwrite behavior of the replication job, run the gcloud alpha transfer jobs update command with the --overwrite-when flag:

gcloud alpha transfer jobs update JOB_NAME --overwrite-when=OVERWRITE_OPTION

Replace:

  • JOB_NAME with the unique ID of the replication job. For example, 1234567890. To find the ID of your transfer job, list or view your transfer job.

  • OVERWRITE_OPTION with an option for how existing objects in the destination bucket can be overwritten as the result of a replication job, which can happen when the destination object and the source object have the same name. The value must be one of the following:

    • always: Always overwrite destination objects.

    • different: Only overwrite objects in the destination bucket if the destination object data is different from the source object data.

    • never: Never overwrite destination objects.

REST APIs

JSON API

  1. Have gcloud CLI installed and initialized, in order to generate an access token for the Authorization header.

    Alternatively, you can create an access token using the OAuth 2.0 Playground and include it in the Authorization header.

  2. Create a JSON file that contains the following structure that includes the fields of the TransferJob object you want to update:

    {
     "projectId": string,
     "transferJob": {
       object (TransferJob)
     },
       "updateTransferJobFieldMask": UPDATE_MASK
    }

    Where:

    • object (TransferJob) is replaced with the fields of the replication job you want to update. See the TransferJob resource representation for more information.

    • UPDATE_MASK is a comma-separated list of the field names you want to update. Values can be one or more of the following: description, transferSpec, notificationConfig, loggingConfig, status.

    For more information about the field names you can include, see the transferJobs.patch request body.

  3. Use cURL to call the Storage Transfer Service REST API with a transferJobs.patch request:

    curl -X PATCH \
    -H "Authorization: Bearer $(gcloud auth print-access-token)" \
    -H "Content-Type: application/json" \
    "https://storagetransfer.googleapis.com/v1/transferJobs/JOB_NAME"

    Replace:

    • JOB_NAME with the unique ID of the replication job. For example, 1234567890. To find the ID of your replication job, list your replication jobs.

Delete a replication job

Console

  1. In the Google Cloud console, go to the Cloud Storage Buckets page.

    Go to Buckets

  2. In the list of buckets, click the name of the source bucket you want to stop replicating.

  3. On the Bucket details page, click the Configuration tab.

  4. Locate the Cross-bucket replication option and click Edit.

  5. In the Edit cross-bucket replication pane that appears, click Delete next to the replication job you want to delete.

  6. In the dialogue that appears, click Confirm.

Command line

Use the gcloud alpha transfer jobs delete command:

gcloud alpha transfer jobs delete JOB_NAME

Replace:

  • JOB_NAME with the unique ID of the replication job. For example, 1234567890. To find the ID of your replication job, list your replication jobs.

REST APIs

JSON API

  1. Have gcloud CLI installed and initialized, in order to generate an access token for the Authorization header.

    Alternatively, you can create an access token using the OAuth 2.0 Playground and include it in the Authorization header.

  2. Use cURL to call the Storage Transfer Service REST API with a transferJobs.delete request:

    curl -X DELETE \
    -H "Authorization: Bearer $(gcloud auth print-access-token)" \
    -H "Content-Type: application/json" \
    "https://storagetransfer.googleapis.com/v1/transferJobs/JOB_NAME"

    Replace:

    • JOB_NAME with the unique ID of the replication job. For example, 1234567890. To find the ID of your replication job, list your replication jobs.