Working with Cloud Storage

AI Platform Prediction reads data from Cloud Storage locations where you have granted access to your AI Platform Prediction project. This page gives a quick guide to using Cloud Storage with AI Platform Prediction.

Overview

Using Cloud Storage is required or recommended for the following aspects of AI Platform Prediction services:

Online prediction

Batch prediction

  • Storing your batch prediction input files.
  • Storing your batch prediction output.
  • Storing your model, if you use batch prediction without deploying the model on AI Platform Prediction first.

Region considerations

When you create a Cloud Storage bucket to use with AI Platform Prediction you should:

  • Assign it to a specific compute region, not to a multi-region value.
  • Use the same region where you run your training jobs.

See more about the AI Platform Prediction available regions.

Setting up your Cloud Storage buckets

This section shows you how to create a new bucket. You can use an existing bucket, but it must be in the same region where you plan on running AI Platform jobs. Additionally, if it is not part of the project you are using to run AI Platform Prediction, you must explicitly grant access to the AI Platform Prediction service accounts.

  1. Specify a name for your new bucket. The name must be unique across all buckets in Cloud Storage.

    BUCKET_NAME="YOUR_BUCKET_NAME"

    For example, use your project name with -aiplatform appended:

    PROJECT_ID=$(gcloud config list project --format "value(core.project)")
    BUCKET_NAME=${PROJECT_ID}-aiplatform
  2. Check the bucket name that you created.

    echo $BUCKET_NAME
  3. Select a region for your bucket and set a REGION environment variable.

    Use the same region where you plan on running AI Platform Prediction jobs. See the available regions for AI Platform Prediction services.

    For example, the following code creates REGION and sets it to us-central1:

    REGION=us-central1
  4. Create the new bucket:

    gcloud storage buckets create gs://$BUCKET_NAME --location=$REGION

Model organization in buckets

Organize the folder structure in your bucket to accommodate many iterations of your model.

  • Place each saved model into its own separate directory within your bucket.
  • Consider using timestamps to name the directories in your bucket.

For example, you can place your first model in a structure similar to gs://your-bucket/your-model-DATE1/your-saved-model-file. To name the directories for each subsequent iteration of your model, use an updated timestamp (gs://your-bucket/your-model-DATE2/your-saved-model-file and so on).

Accessing Cloud Storage during prediction

If you deploy a custom prediction routine (beta) or a scikit-learn pipeline with custom code (beta), your model version can read from any Cloud Storage bucket in the same project during its handling of predictions.

Use a Python module that can read from Cloud Storage in your custom prediction code, like the Python Client for Google Cloud Storage, TensorFlow's tf.io.gfile.GFile module, or pandas 0.24.0 or later. AI Platform Prediction takes care of authentication.

You can also specify a service account when you deploy your custom prediction routine in order to customize which Cloud Storage resources your deployment has access to.

Using a Cloud Storage bucket from a different project

This section describes how to configure Cloud Storage buckets from outside of your project so that AI Platform Prediction can access them.

If you set up your Cloud Storage bucket in the same project where you are using AI Platform Prediction, your AI Platform Prediction service accounts already have the necessary permissions to access your Cloud Storage bucket.

These instructions are provided for the following cases:

  • You are unable to use a bucket from your project, such as when a large dataset is shared across multiple projects.
  • If you use multiple buckets with AI Platform Prediction, you must grant access to the AI Platform Prediction service accounts separately for each one.

Step 1: Get required information from your cloud project

Console

  1. Open the IAM page in the Google Cloud console.

    Open the IAM Page

  2. The IAM page displays a list of all principals that have access to your project, along with their associated role(s). Your AI Platform Prediction project has multiple service accounts. Locate the service account in the list that has the role Cloud ML Service Agent and copy that service account ID, which looks similar to this:

    "service-111111111111@cloud-ml.google.com.iam.gserviceaccount.com".

    You need to paste this service account ID into a different page in the Google Cloud console during the next steps.

Command Line

The steps in this section get information about your Google Cloud project in order to use them to change access control for your project's AI Platform Prediction service account. You need to store the values for later use in environment variables.

  1. Get your project identifier by using the Google Cloud CLI with your project selected:

    PROJECT_ID=$(gcloud config list project --format "value(core.project)")
    
  2. Get the access token for your project by using gcloud:

    AUTH_TOKEN=$(gcloud auth print-access-token)
    
  3. Get the service account information by requesting project configuration from the REST service:

    SVC_ACCOUNT=$(curl -X GET -H "Content-Type: application/json" \
        -H "Authorization: Bearer $AUTH_TOKEN" \
        https://ml.googleapis.com/v1/projects/${PROJECT_ID}:getConfig \
        | python3 -c "import json; import sys; response = json.load(sys.stdin); \
        print(response['serviceAccount'])")
    

Step 2: Configure access to your Cloud Storage bucket

Console

  1. Open the Storage page in the Google Cloud console.

    Open the Storage Page

  2. Select the Cloud Storage bucket you use to deploy models by checking the box to the left of the bucket name.

  3. Click the Show Info Panel button in the upper right corner to display the Permissions tab.

  4. Paste the service account ID into the Add Principals field. To the right of that field, select your desired role(s), such as Storage Legacy Bucket Reader.

    If you are not sure which role to select, you may select multiple roles to see them displayed below the Add Principals field, each with a brief description of its permissions.

  5. To assign your desired role(s) to the service account, click the Add button to the right of the Add Principals field.

Command Line

Now that you have your project and service account information, you need to update the access permissions for your Cloud Storage bucket. These steps use the same variable names used in the previous section.

  1. Set the name of your bucket in an environment variable named BUCKET_NAME:

    BUCKET_NAME="your_bucket_name"
    
  2. Grant the service account read access to objects in your Cloud Storage bucket:

    gcloud storage buckets add-iam-policy-binding gs://$BUCKET_NAME --member=user:$SVC_ACCOUNT --role=roles/storage.legacyObjectReader
    
  3. Grant write access:

    gcloud storage buckets add-iam-policy-binding gs://$BUCKET_NAME --member=user:$SVC_ACCOUNT --role=roles/storage.legacyObjectWriter
    

To choose a role to grant to your AI Platform Prediction service account, see the Cloud Storage IAM roles. For more general information about updating IAM roles in Cloud Storage, see how to grant access to a service account for a resource.

What's next