Transfer specific files or objects using a manifest

Storage Transfer Service supports the transfer of specific files or objects, which are specified using a manifest. A manifest is a CSV file, uploaded to Cloud Storage, that contains a list of files or objects for Storage Transfer Service to act upon.

A manifest can be used for the following transfers:

  • From AWS S3, S3-compatible storage, Azure Blobstore, or Cloud Storage to a Cloud Storage bucket.

  • From a file system to a Cloud Storage bucket.

  • From a Cloud Storage bucket to a file system.

  • Between two file systems.

  • From a publicly-accessible HTTP/HTTPS source to a Cloud Storage bucket. Follow the instructions in Create a URL list as the manifest format is unique for URL lists.

Create a manifest

Manifest files have the following requirements:

  • Manifests must be formatted as CSV.
  • They can contain any UTF-8 characters.
  • The first column must be a filename or object name. The name is relative to the root path or the bucket and folder specified in the transfer job; see File system transfers and Object storage transfers for details.
  • Manifest files do not support wildcards. Folder names without a file or object name are not supported.
  • If a file or object name contains a comma, the name must be enclosed in double-quotes. For example, "doe,john.txt".
  • For transfers that use transfer agents (i.e., file system transfers or transfers from S3-compatible storage), the maximum manifest file size is 1 GiB, which translates to approximately 1 million rows. If your manifest file is larger than 1 GiB, you can split it into multiple files and run multiple transfer jobs. For agentless transfers, there is no limit to the size of the manifest file.

We recommend testing your transfer with a small subset of files or objects to avoid unnecessary API calls due to configuration errors.

You can monitor the status of file transfers from the Transfer Jobs page. Files or objects that fail to transfer are listed in the transfer logs.

File system transfers

To create a manifest of files on a file system, create a CSV file with a single column containing the file paths relative to the root directory specified in the transfer job creation.

For example, to transfer the following file system files:

File path
rootdir/dir1/subdir1/file1.txt
rootdir/file2.txt
rootdir/dir2/subdir1/file3.txt

Your manifest should look like the following example:

dir1/subdir1/file1.txt
file2.txt
dir2/subdir1/file3.txt

Save the manifest file with any filename, and a .csv extension.

Object storage transfers

To create a manifest of objects, create a CSV file whose first column contains the object names relative to the bucket name and path specified in the transfer job creation. All objects must be in the same bucket.

You can also specify an optional second column with the Cloud Storage generation number of the specific version to transfer.

For example, you may wish to transfer the following objects:

Object path Cloud Storage generation number
SOURCE_PATH/object1.pdf 1664826685911832
SOURCE_PATH/object2.pdf
SOURCE_PATH/object3.pdf 1664826610699837

Your manifest should look like the following example:

object1.pdf,1664826685911832
object2.pdf
object3.pdf,1664826610699837

Save the manifest file with any filename, and a .csv extension.

HTTP/HTTPS transfers

To transfer specific files from an HTTP or HTTPS source, refer to the instructions in Create a URL list.

Publish the manifest

Once you've created the manifest, you must make it available to Storage Transfer Service. Storage Transfer Service can access the file in a Cloud Storage bucket, or on your file system.

Upload the manifest to Cloud Storage

You can store the manifest file in any Cloud Storage bucket.

The service agent running the transfer must have storage.objects.get permission for the bucket containing the manifest. See Grant the required permissions for instructions on finding the service agent ID, and granting permissions to that service agent on a bucket.

For instructions on uploading the manifest to a bucket, see Upload objects in the Cloud Storage documentation.

For example, to use the gcloud CLI to upload a file to Cloud Storage, use the gcloud storage cp command:

gcloud storage cp MANIFEST.CSV gs://DESTINATION_BUCKET_NAME/

Where:

  • MANIFEST.CSV is the local path to your manifest file. For example, Desktop/manifest01.csv.

  • DESTINATION_BUCKET_NAME is the name of the bucket to which you are uploading your object. For example, my-bucket.

If successful, the response looks like the following example:

Completed files 1/1 | 164.3kiB/164.3kiB

You can encrypt a manifest using customer-managed Cloud KMS encryption keys. In this case, ensure that any service accounts accessing the manifest are assigned the applicable encryption keys. Customer-supplied keys are not supported.

Store the manifest on a file system

You can store the manifest file on your source or destination file system.

The location of the file must be accessible to the transfer agents. If you restrict directory access for your agents, make sure the manifest file is located within a mounted directory.

Start a transfer

Do not modify the manifest file until a transfer operation completes. We recommend that you lock the manifest file when a transfer is taking place.

Cloud console

To start a transfer with a manifest from the Cloud console:

  1. Follow the instructions in Create transfers to select your source, destination, and options.

  2. In the final step, Choose settings, select the checkbox named Provide list of files to transfer via manifest file.

  3. Enter the manifest file location.

gcloud

To transfer the files or objects that are listed in the manifest, include the --manifest-file=MANIFEST_FILE flag with your gcloud transfer jobs create command.

gcloud transfer jobs create SOURCE DESTINATION \
  --manifest-file=MANIFEST_FILE

MANIFEST_FILE can be any of the following values:

  • The path to the CSV file in a Cloud Storage bucket:

    --manifest-file=gs://my_bucket/sample_manifest.csv
    

    See Upload the manifest to Cloud Storage for details on required permissions, if the bucket or file is not public.

  • The relative path from the file system SOURCE, including any path that was specified:

    --manifest-file=source://relative_path/sample_manifest.csv
    
  • The relative path from the file system DESTINATION, including any path that was specified:

    --manifest-file=destination://relative_path/sample_manifest.csv
    

REST + Client libraries

REST

To transfer the files or objects that are listed in the manifest, make a createTransferJob API call that specifies a transferSpec with the transferManifest field added. For example:

POST https://storagetransfer.googleapis.com/v1/transferJobs

...
  "transferSpec": {
      "posixDataSource": {
          "rootDirectory": "/home/",
      },
      "gcsDataSink": {
          "bucketName": "GCS_NEARLINE_SINK_NAME",
          "path": "GCS_SINK_PATH",
      },
      "transferManifest": {
          "location": "gs://my_bucket/sample_manifest.csv"
      }
  }

The manifest file can be stored in a Cloud Storage bucket, or on the source or destination file system. Cloud Storage buckets must use the gs:// prefix and include the full path, including the bucket name. File system locations must use a source:// or destination:// prefix and are relative to the file system source or destination, and optional root directory.

Go


import (
	"context"
	"fmt"
	"io"

	storagetransfer "cloud.google.com/go/storagetransfer/apiv1"
	"cloud.google.com/go/storagetransfer/apiv1/storagetransferpb"
)

func transferUsingManifest(w io.Writer, projectID string, sourceAgentPoolName string, rootDirectory string, gcsSinkBucket string, manifestBucket string, manifestObjectName string) (*storagetransferpb.TransferJob, error) {
	// Your project id
	// projectId := "myproject-id"

	// The agent pool associated with the POSIX data source. If not provided, defaults to the default agent
	// sourceAgentPoolName := "projects/my-project/agentPools/transfer_service_default"

	// The root directory path on the source filesystem
	// rootDirectory := "/directory/to/transfer/source"

	// The ID of the GCS bucket to transfer data to
	// gcsSinkBucket := "my-sink-bucket"

	// The ID of the GCS bucket that contains the manifest file
	// manifestBucket := "my-manifest-bucket"

	// The name of the manifest file in manifestBucket that specifies which objects to transfer
	// manifestObjectName := "path/to/manifest.csv"

	ctx := context.Background()
	client, err := storagetransfer.NewClient(ctx)
	if err != nil {
		return nil, fmt.Errorf("storagetransfer.NewClient: %w", err)
	}
	defer client.Close()

	manifestLocation := "gs://" + manifestBucket + "/" + manifestObjectName
	req := &storagetransferpb.CreateTransferJobRequest{
		TransferJob: &storagetransferpb.TransferJob{
			ProjectId: projectID,
			TransferSpec: &storagetransferpb.TransferSpec{
				SourceAgentPoolName: sourceAgentPoolName,
				DataSource: &storagetransferpb.TransferSpec_PosixDataSource{
					PosixDataSource: &storagetransferpb.PosixFilesystem{RootDirectory: rootDirectory},
				},
				DataSink: &storagetransferpb.TransferSpec_GcsDataSink{
					GcsDataSink: &storagetransferpb.GcsData{BucketName: gcsSinkBucket},
				},
				TransferManifest: &storagetransferpb.TransferManifest{Location: manifestLocation},
			},
			Status: storagetransferpb.TransferJob_ENABLED,
		},
	}

	resp, err := client.CreateTransferJob(ctx, req)
	if err != nil {
		return nil, fmt.Errorf("failed to create transfer job: %w", err)
	}
	if _, err = client.RunTransferJob(ctx, &storagetransferpb.RunTransferJobRequest{
		ProjectId: projectID,
		JobName:   resp.Name,
	}); err != nil {
		return nil, fmt.Errorf("failed to run transfer job: %w", err)
	}
	fmt.Fprintf(w, "Created and ran transfer job from %v to %v using manifest file %v with name %v", rootDirectory, gcsSinkBucket, manifestLocation, resp.Name)
	return resp, nil
}

Java


import com.google.storagetransfer.v1.proto.StorageTransferServiceClient;
import com.google.storagetransfer.v1.proto.TransferProto;
import com.google.storagetransfer.v1.proto.TransferTypes.GcsData;
import com.google.storagetransfer.v1.proto.TransferTypes.PosixFilesystem;
import com.google.storagetransfer.v1.proto.TransferTypes.TransferJob;
import com.google.storagetransfer.v1.proto.TransferTypes.TransferManifest;
import com.google.storagetransfer.v1.proto.TransferTypes.TransferSpec;
import java.io.IOException;

public class TransferUsingManifest {

  public static void main(String[] args) throws IOException {
    // TODO(developer): Replace these variables before running the sample.

    // Your project id
    String projectId = "my-project-id";

    // The agent pool associated with the POSIX data source. If not provided, defaults to the
    // default agent
    String sourceAgentPoolName = "projects/my-project-id/agentPools/transfer_service_default";

    // The root directory path on the source filesystem
    String rootDirectory = "/directory/to/transfer/source";

    // The ID of the GCS bucket to transfer data to
    String gcsSinkBucket = "my-sink-bucket";

    // The ID of the GCS bucket which has your manifest file
    String manifestBucket = "my-bucket";

    // The ID of the object in manifestBucket that specifies which files to transfer
    String manifestObjectName = "path/to/manifest.csv";

    transferUsingManifest(
        projectId,
        sourceAgentPoolName,
        rootDirectory,
        gcsSinkBucket,
        manifestBucket,
        manifestObjectName);
  }

  public static void transferUsingManifest(
      String projectId,
      String sourceAgentPoolName,
      String rootDirectory,
      String gcsSinkBucket,
      String manifestBucket,
      String manifestObjectName)
      throws IOException {
    String manifestLocation = "gs://" + manifestBucket + "/" + manifestObjectName;
    TransferJob transferJob =
        TransferJob.newBuilder()
            .setProjectId(projectId)
            .setTransferSpec(
                TransferSpec.newBuilder()
                    .setSourceAgentPoolName(sourceAgentPoolName)
                    .setPosixDataSource(
                        PosixFilesystem.newBuilder().setRootDirectory(rootDirectory).build())
                    .setGcsDataSink((GcsData.newBuilder().setBucketName(gcsSinkBucket)).build())
                    .setTransferManifest(
                        TransferManifest.newBuilder().setLocation(manifestLocation).build()))
            .setStatus(TransferJob.Status.ENABLED)
            .build();

    // Initialize client that will be used to send requests. This client only needs to be created
    // once, and can be reused for multiple requests. After completing all of your requests, call
    // the "close" method on the client to safely clean up any remaining background resources,
    // or use "try-with-close" statement to do this automatically.
    try (StorageTransferServiceClient storageTransfer = StorageTransferServiceClient.create()) {

      // Create the transfer job
      TransferJob response =
          storageTransfer.createTransferJob(
              TransferProto.CreateTransferJobRequest.newBuilder()
                  .setTransferJob(transferJob)
                  .build());

      System.out.println(
          "Created and ran a transfer job from "
              + rootDirectory
              + " to "
              + gcsSinkBucket
              + " using "
              + "manifest file "
              + manifestLocation
              + " with name "
              + response.getName());
    }
  }
}

Node.js


// Imports the Google Cloud client library
const {
  StorageTransferServiceClient,
} = require('@google-cloud/storage-transfer');

/**
 * TODO(developer): Uncomment the following lines before running the sample.
 */
// Your project id
// const projectId = 'my-project'

// The agent pool associated with the POSIX data source. Defaults to the default agent
// const sourceAgentPoolName = 'projects/my-project/agentPools/transfer_service_default'

// The root directory path on the source filesystem
// const rootDirectory = '/directory/to/transfer/source'

// The ID of the GCS bucket to transfer data to
// const gcsSinkBucket = 'my-sink-bucket'

// Transfer manifest location. Must be a `gs:` URL
// const manifestLocation = 'gs://my-bucket/sample_manifest.csv'

// Creates a client
const client = new StorageTransferServiceClient();

/**
 * Creates a request to transfer from the local file system to the sink bucket
 */
async function transferViaManifest() {
  const createRequest = {
    transferJob: {
      projectId,
      transferSpec: {
        sourceAgentPoolName,
        posixDataSource: {
          rootDirectory,
        },
        gcsDataSink: {bucketName: gcsSinkBucket},
        transferManifest: {
          location: manifestLocation,
        },
      },
      status: 'ENABLED',
    },
  };

  // Runs the request and creates the job
  const [transferJob] = await client.createTransferJob(createRequest);

  const runRequest = {
    jobName: transferJob.name,
    projectId: projectId,
  };

  await client.runTransferJob(runRequest);

  console.log(
    `Created and ran a transfer job from '${rootDirectory}' to '${gcsSinkBucket}' using manifest \`${manifestLocation}\` with name ${transferJob.name}`
  );
}

transferViaManifest();

Python

from google.cloud import storage_transfer


def create_transfer_with_manifest(
    project_id: str,
    description: str,
    source_agent_pool_name: str,
    root_directory: str,
    sink_bucket: str,
    manifest_location: str,
):
    """Create a transfer from a POSIX file system to a GCS bucket using
    a manifest file."""

    client = storage_transfer.StorageTransferServiceClient()

    # The ID of the Google Cloud Platform Project that owns the job
    # project_id = 'my-project-id'

    # A useful description for your transfer job
    # description = 'My transfer job'

    # The agent pool associated with the POSIX data source.
    # Defaults to 'projects/{project_id}/agentPools/transfer_service_default'
    # source_agent_pool_name = 'projects/my-project/agentPools/my-agent'

    # The root directory path on the source filesystem
    # root_directory = '/directory/to/transfer/source'

    # Google Cloud Storage destination bucket name
    # sink_bucket = 'my-gcs-destination-bucket'

    # Transfer manifest location. Must be a `gs:` URL
    # manifest_location = 'gs://my-bucket/sample_manifest.csv'

    transfer_job_request = storage_transfer.CreateTransferJobRequest(
        {
            "transfer_job": {
                "project_id": project_id,
                "description": description,
                "status": storage_transfer.TransferJob.Status.ENABLED,
                "transfer_spec": {
                    "source_agent_pool_name": source_agent_pool_name,
                    "posix_data_source": {
                        "root_directory": root_directory,
                    },
                    "gcs_data_sink": {
                        "bucket_name": sink_bucket,
                    },
                    "transfer_manifest": {"location": manifest_location},
                },
            }
        }
    )

    result = client.create_transfer_job(transfer_job_request)
    print(f"Created transferJob: {result.name}")

The objects or files in the manifest aren't necessarily transferred in the listed order.

If the manifest includes files that already exist in the destination, those files are skipped unless the overwrite objects already existing in sink option is specified.

If the manifest includes objects that exist in a different version in the destination, the object in the destination is overwritten with the source version of the object. If the destination is a versioned bucket, a new version of the object is created.

What's next