Storage Transfer Service can be used to transfer large amounts of data between Cloud Storage buckets, either within the same Google Cloud project, or between different projects.
Bucket migrations are useful in a number of scenarios. They can be used to consolidate data from separate projects, to move data into a backup location, or to change the location of your data.
When to use Storage Transfer Service
Google Cloud offers multiple options to transfer data between Cloud Storage buckets. We recommend the following guidelines:
Transferring less than 1 TB: Use
gcloud
. For instructions, refer to Move and rename buckets.Transferring more than 1 TB: Use Storage Transfer Service. Storage Transfer Service is a managed transfer option that provides out of the box security, reliability, and performance. It eliminates the need to optimize and maintain scripts, and handle retries.
This guide discusses best practices when transferring data between Cloud Storage buckets using Storage Transfer Service.
Define a transfer strategy
What your transfer strategy looks like depends on the complexity of your situation. Make sure to include the following considerations in your plan.
Choose a bucket name
To move your data to a storage bucket with a different location, choose one of the following approaches:
- New bucket name. Update your applications to point to a storage bucket with a different name.
- Keep bucket name. Replace your storage bucket to keep the current name, meaning you don't need to update your applications.
In both cases you should plan for downtime, and give your users suitable notice that downtime is coming. Review the following explanations to understand which choice is best for you.
New bucket name
With a new bucket name, you need to update all code and services that use your current bucket. How you do this depends on how your applications are built and deployed.
For certain setups this approach might have less downtime, but requires more work to ensure a smooth transition. It involves the following steps:
- Copying your data to a new storage bucket.
- Starting your downtime.
- Updating your applications to point to the new bucket.
- Verifying that everything works as expected, and that all relevant systems and accounts have access to the bucket.
- Deleting the original bucket.
- Ending your downtime.
Keep bucket name
Use this approach if you prefer not to change your code to point to a new bucket name. It involves the following steps:
- Copying your data to a temporary storage bucket.
- Starting your downtime.
- Deleting your original bucket.
- Creating a new bucket with the same name as your original bucket.
- Copying the data to your new bucket from the temporary bucket.
- Deleting the temporary bucket.
- Verifying that everything works as expected, and that all relevant systems and accounts have access to the bucket.
- Ending your downtime.
Minimize downtime
Storage Transfer Service does not lock reads or writes on the source or destination buckets during a transfer.
If you choose to manually lock reads/writes on your bucket, you can minimize downtime by transferring your data in two steps: seed, and sync.
Seed transfer: Perform a bulk transfer without locking read/write on the source.
Sync transfer: After the first run is complete, lock the read/write on the source bucket and perform another transfer. Storage Transfer Service transfers are incremental by default, so this second transfer only transfers data that changed during the seed transfer.
Optimize the transfer speed
When estimating how long a transfer job takes, consider the possible bottlenecks. For example, if the source has billions of small files, then your transfer speed is going to be QPS-bound. If object sizes are large, bandwidth might be the bottleneck.
Bandwidth limits are set at the region level and are fairly allocated across all projects. If sufficient bandwidth is available, Storage Transfer Service can complete around 1000 tasks per transfer job per second. You can accelerate a transfer in this case by splitting your job into multiple small transfer jobs, for example by using include and exclude prefixes to transfer certain files.
In cases where the location, storage class, and encryption key are the same, Storage Transfer Service does not create a new copy of the bytes; it instead creates a new metadata entry that points to the source blob. As a result, same location and class copies of a large corpus are completed very quickly and are only QPS-bound.
Deletes are also metadata-only operations. For these transfers, parallelizing the transfer by splitting it into multiple small jobs can increase the speed.
Preserve metadata
The following object metadata is preserved when transferring data between Cloud Storage buckets with Storage Transfer Service:
- User-created custom metadata.
- Cloud Storage fixed-key metadata fields, such as Cache-Control, Content-Disposition, Content-Type, and Custom-Time.
- Object size.
- Generation number is preserved
as a custom metadata field with the key
x-goog-reserved-source-generation
, which you can edit later or remove.
The following metadata fields can optionally be preserved when transferring using the API:
- ACLs (
acl
) - Storage class (
storageClass
) - CMEK (
kmsKey
) - Temporary hold (
temporaryHold
) - Object creation time (
customTime
)
Refer to the TransferSpec
API reference
for more details.
The following metadata fields aren't preserved:
- Last updated time (
updated
) etag
componentCount
If preserved, object creation time is stored as a custom field,
customTime
. The object's updated
time is reset upon transfer, so the
object's time spent in its storage class is also reset. This means an object in
Coldline Storage, post-transfer, has to exist again for 90 days at the
destination to avoid early deletion charges.
You can apply your createTime
-based lifecycle policies
using customTime
. Existing customTime
values are overwritten.
For more details on what is and isn't preserved, refer to Metadata preservation.
Handle versioned objects
If you want to transfer all versions of your storage objects and not just the
latest, you need to use either the gcloud
CLI or REST API to transfer
your data, combined with Storage Transfer Service's manifest feature.
To transfer all object versions:
List the bucket objects and copy them into a JSON file:
gcloud storage ls --all-versions --recursive --json [SOURCE_BUCKET] > object-listing.json
This command typically lists around 1k objects per second.
Split the JSON file into two CSV files, one file with non-current versions, and another with the live versions:
jq -r '.[] | select( .type=="cloud_object" and (.metadata | has("timeDeleted") | not)) | [.metadata.name, .metadata.generation] | @csv' object-listing.json > live-object-manifest.csv jq -r '.[] | select( .type=="cloud_object" and (.metadata | has("timeDeleted"))) | [.metadata.name, .metadata.generation] | @csv' object-listing.json > non-current-object-manifest.csv
Enable object versioning on the destination bucket.
Transfer the non-current versions first by passing the
non-current-object-manifest.csv
manifest file as the value of thetransferManifest
field.Then, transfer the live versions in the same way, specifying
live-object-manifest.csv
as the manifest file.
Configure transfer options
Some of the options available to you when setting up your transfer are as follows:
Logging: Cloud Logging provides detailed logs of individual objects, allowing you to verify transfer status and to perform additional data integrity checks.
Filtering: You can use include and exclude prefixes to limit which objects Storage Transfer Service operates on. This option can be used to split a transfer into multiple transfer jobs so that they can run in parallel. See Optimize the transfer speed for more information.
Transfer options: You can configure your transfer to overwrite existing items in the destination bucket; to delete objects in the destination that don't exist in the transfer set; or to delete transferred objects from the source.
Transfer your data
After you've defined your transfer strategy, you can perform the transfer itself.
Create a new bucket
Before beginning the transfer, create a storage bucket. See location_considerations for help choosing an appropriate bucket location.
You might wish to copy over some of the bucket metadata when you create the new bucket. See Get bucket metadata to learn how to display the source bucket's metadata, so that you can apply the same settings to your new bucket.
Copy objects to the new bucket
You can copy objects from the source bucket to a new bucket using the
Google Cloud console, the gcloud
CLI, REST API, or client libraries.
Which approach you choose depends on your transfer strategy.
The following instructions are for the basic use case of transferring objects from one bucket to another, and should be modified to fit your needs.
Don't include sensitive information such as personally identifiable information (PII) or security data in your transfer job name. Resource names may be propagated to the names of other Google Cloud resources and may be exposed to Google-internal systems outside of your project.
Google Cloud console
Use the Cloud Storage Transfer Service from within Google Cloud console:
Open the Transfer page in the Google Cloud console.
- Click Create transfer job.
Follow the step-by-step walkthrough, clicking Next step as you complete each step:
Get started: Use Google Cloud Storage as both your Source Type and Destination Type.
Choose a source: Either enter the name of the wanted bucket directly, or click Browse to find and select the bucket you want.
Choose a destination: Either enter the name of the wanted bucket directly, or click Browse to find and select the bucket you want.
Choose settings: Select the option Delete files from source after they're transferred.
Scheduling options: You can ignore this section.
After you complete the step-by-step walkthrough, click Create.
This begins the process of copying objects from your old bucket into your new one. This process may take some time; however, after you click Create, you can navigate away from the Google Cloud console.
To view the transfer's progress:
Open the Transfer page in the Google Cloud console.
To learn how to get detailed error information about failed Storage Transfer Service operations in the Google Cloud console, see Troubleshooting.
After the transfer completes, you don't need to do anything to delete the objects from your old bucket if you selected the Delete source objects after the transfer completes checkbox during setup. You may, however, want to also delete your old bucket, which you must do separately.
gcloud CLI
Install the gcloud CLI
If you haven't already, install the gcloud command-line tool.
Then, call gcloud init
to initialize the tool and to specify your project ID
and user account. See Initializing Cloud SDK for
more details.
gcloud init
Add the service account to your destination folder
You must add the Storage Transfer Service service account to your destination bucket
before creating a transfer. To do so, use
gcloud storage buckets add-iam-policy-binding
:
gcloud storage buckets add-iam-policy-binding gs://bucket_name \ --member=serviceAccount:project-12345678@storage-transfer-service.iam.gserviceaccount.com \ --role=roles/storage.admin
For instructions using the Google Cloud console or API, refer to Use IAM permissions in the Cloud Storage documentation.
Create the transfer job
To create a new transfer job, use the gcloud transfer jobs create
command.
Creating a new job initiates the specified transfer, unless a schedule or
--do-not-run
is specified.
gcloud transfer jobs create SOURCE DESTINATION
Where:
SOURCE is the data source for this transfer, in the format
gs://BUCKET_NAME
.DESTINATION is your new bucket, in the form
gs://BUCKET_NAME
.
Additional options include:
Job information: You can specify
--name
and--description
.Schedule: Specify
--schedule-starts
,--schedule-repeats-every
, and--schedule-repeats-until
, or--do-not-run
.Object conditions: Use conditions to determine which objects are transferred. These include
--include-prefixes
and--exclude-prefixes
, and the time-based conditions in--include-modified-[before | after]-[absolute | relative]
.Transfer options: Specify whether to overwrite destination files (
--overwrite-when=different
oralways
) and whether to delete certain files during or after the transfer (--delete-from=destination-if-unique
orsource-after-transfer
); specify which [metadata values to preserve]metadata; and optionally set a storage class on transferred objects (--custom-storage-class
).Notifications: Configure Pub/Sub notifications for transfers with
--notification-pubsub-topic
,--notification-event-types
, and--notification-payload-format
.
To view all options, run gcloud transfer jobs create --help
.
For example, to transfer all objects with the prefix folder1
:
gcloud transfer jobs create gs://old-bucket gs://new-bucket \
--include-prefixes="folder1/"
REST
In this example, you'll learn how to move files from one Cloud Storage bucket to another. For example, you can move data to a bucket in another location.
Request using transferJobs create:
POST https://storagetransfer.googleapis.com/v1/transferJobs { "description": "YOUR DESCRIPTION", "status": "ENABLED", "projectId": "PROJECT_ID", "schedule": { "scheduleStartDate": { "day": 1, "month": 1, "year": 2025 }, "startTimeOfDay": { "hours": 1, "minutes": 1 }, "scheduleEndDate": { "day": 1, "month": 1, "year": 2025 } }, "transferSpec": { "gcsDataSource": { "bucketName": "GCS_SOURCE_NAME" }, "gcsDataSink": { "bucketName": "GCS_SINK_NAME" }, "transferOptions": { "deleteObjectsFromSourceAfterTransfer": true } } }
Response:
200 OK { "transferJob": [ { "creationTime": "2015-01-01T01:01:00.000000000Z", "description": "YOUR DESCRIPTION", "name": "transferJobs/JOB_ID", "status": "ENABLED", "lastModificationTime": "2015-01-01T01:01:00.000000000Z", "projectId": "PROJECT_ID", "schedule": { "scheduleStartDate": { "day": 1, "month": 1, "year": 2015 }, "startTimeOfDay": { "hours": 1, "minutes": 1 } }, "transferSpec": { "gcsDataSource": { "bucketName": "GCS_SOURCE_NAME", }, "gcsDataSink": { "bucketName": "GCS_NEARLINE_SINK_NAME" }, "objectConditions": { "minTimeElapsedSinceLastModification": "2592000.000s" }, "transferOptions": { "deleteObjectsFromSourceAfterTransfer": true } } } ] }
Client libraries
In this example, you'll learn how to move files from one Cloud Storage bucket to another. For example, you can replicate data to a bucket in another location.
For more information about the Storage Transfer Service client libraries, see Getting started with Storage Transfer Service client libraries.
Java
Looking for older samples? See the Storage Transfer Service Migration Guide.
Python
Looking for older samples? See the Storage Transfer Service Migration Guide.
Verify copied objects
After your transfer is complete, we recommend performing additional data integrity checks.
Validate that the objects were copied correctly, by verifying the metadata on the objects, such as checksums and size.
Verify that the correct version of the objects were copied. Storage Transfer Service offers an out-of-the-box option to verify that objects are copies. If you've enabled logging, view logs to verify whether all the objects were successfully copied, including their corresponding metadata fields.
Start using the destination bucket
After the migration is complete and verified, update any existing applications or workloads so that they use the target bucket name. Check data access logs in Cloud Audit Logs to ensure that your operations are correctly modifying and reading objects.
Delete the original bucket
After everything is working well, delete the original bucket.
Storage Transfer Service offers the option of deleting objects after they have been
transferred by specifying deleteObjectsFromSourceAfterTransfer: true
in the
job configuration, or selecting the option in the Google Cloud console.
Schedule object deletion
To schedule the deletion of your objects at a later date, use a combination of a
scheduled transfer job, and the
deleteObjectsUniqueInSink = true
option.
The transfer job should be set up to transfer an empty bucket into the bucket containing your objects. This causes Storage Transfer Service to list the objects and begin deleting them. As deletions are a metadata-only operation, the transfer job is only QPS-bound. To speed up the process, split the transfer into multiple jobs, each acting on a distinct set of prefixes.
Alternatively, Google Cloud offers a managed cron job scheduler. For more information, see Schedule Google Cloud STS Transfer Job with Cloud Scheduler.