AI Platform Prediction reads data from Cloud Storage locations where you have granted access to your AI Platform Prediction project. This page gives a quick guide to using Cloud Storage with AI Platform Prediction.
Overview
Using Cloud Storage is required or recommended for the following aspects of AI Platform Prediction services:
Online prediction
- Storing your saved model to make it into a model version.
- Storing custom code to handle prediction requests, if you are using a custom prediction routine (beta) or a scikit-learn pipeline with custom code (beta).
- Storing additional data for your custom code to access when handling predictions.
Batch prediction
- Storing your batch prediction input files.
- Storing your batch prediction output.
- Storing your model, if you use batch prediction without deploying the model on AI Platform Prediction first.
Region considerations
When you create a Cloud Storage bucket to use with AI Platform Prediction you should:
- Assign it to a specific compute region, not to a multi-region value.
- Use the same region where you run your training jobs.
See more about the AI Platform Prediction available regions.
Setting up your Cloud Storage buckets
This section shows you how to create a new bucket. You can use an existing bucket, but it must be in the same region where you plan on running AI Platform jobs. Additionally, if it is not part of the project you are using to run AI Platform Prediction, you must explicitly grant access to the AI Platform Prediction service accounts.
-
Specify a name for your new bucket. The name must be unique across all buckets in Cloud Storage.
BUCKET_NAME="YOUR_BUCKET_NAME"
For example, use your project name with
-aiplatform
appended:PROJECT_ID=$(gcloud config list project --format "value(core.project)") BUCKET_NAME=${PROJECT_ID}-aiplatform
-
Check the bucket name that you created.
echo $BUCKET_NAME
-
Select a region for your bucket and set a
REGION
environment variable.Use the same region where you plan on running AI Platform Prediction jobs. See the available regions for AI Platform Prediction services.
For example, the following code creates
REGION
and sets it tous-central1
:REGION=us-central1
-
Create the new bucket:
gcloud storage buckets create gs://$BUCKET_NAME --location=$REGION
Model organization in buckets
Organize the folder structure in your bucket to accommodate many iterations of your model.
- Place each saved model into its own separate directory within your bucket.
- Consider using timestamps to name the directories in your bucket.
For example, you can place your first model in a structure similar to
gs://your-bucket/your-model-DATE1/your-saved-model-file
. To name the
directories for each subsequent iteration of your model, use an updated
timestamp (gs://your-bucket/your-model-DATE2/your-saved-model-file
and
so on).
Accessing Cloud Storage during prediction
If you deploy a custom prediction routine (beta) or a scikit-learn pipeline with custom code (beta), your model version can read from any Cloud Storage bucket in the same project during its handling of predictions.
Use a Python module that can read from Cloud Storage in your custom
prediction code, like the Python Client for Google Cloud
Storage,
TensorFlow's
tf.io.gfile.GFile
module, or pandas
0.24.0 or later. AI Platform Prediction takes care of authentication.
You can also specify a service account when you deploy your custom prediction routine in order to customize which Cloud Storage resources your deployment has access to.
Using a Cloud Storage bucket from a different project
This section describes how to configure Cloud Storage buckets from outside of your project so that AI Platform Prediction can access them.
If you set up your Cloud Storage bucket in the same project where you are using AI Platform Prediction, your AI Platform Prediction service accounts already have the necessary permissions to access your Cloud Storage bucket.
These instructions are provided for the following cases:
- You are unable to use a bucket from your project, such as when a large dataset is shared across multiple projects.
- If you use multiple buckets with AI Platform Prediction, you must grant access to the AI Platform Prediction service accounts separately for each one.
Step 1: Get required information from your cloud project
Console
Open the IAM page in the Google Cloud console.
The IAM page displays a list of all principals that have access to your project, along with their associated role(s). Your AI Platform Prediction project has multiple service accounts. Locate the service account in the list that has the role Cloud ML Service Agent and copy that service account ID, which looks similar to this:
"service-111111111111@cloud-ml.google.com.iam.gserviceaccount.com".
You need to paste this service account ID into a different page in the Google Cloud console during the next steps.
Command Line
The steps in this section get information about your Google Cloud project in order to use them to change access control for your project's AI Platform Prediction service account. You need to store the values for later use in environment variables.
Get your project identifier by using the Google Cloud CLI with your project selected:
PROJECT_ID=$(gcloud config list project --format "value(core.project)")
Get the access token for your project by using
gcloud
:AUTH_TOKEN=$(gcloud auth print-access-token)
Get the service account information by requesting project configuration from the REST service:
SVC_ACCOUNT=$(curl -X GET -H "Content-Type: application/json" \ -H "Authorization: Bearer $AUTH_TOKEN" \ https://ml.googleapis.com/v1/projects/${PROJECT_ID}:getConfig \ | python3 -c "import json; import sys; response = json.load(sys.stdin); \ print(response['serviceAccount'])")
Step 2: Configure access to your Cloud Storage bucket
Console
Open the Storage page in the Google Cloud console.
Select the Cloud Storage bucket you use to deploy models by checking the box to the left of the bucket name.
Click the Show Info Panel button in the upper right corner to display the Permissions tab.
Paste the service account ID into the Add Principals field. To the right of that field, select your desired role(s), such as Storage Legacy Bucket Reader.
If you are not sure which role to select, you may select multiple roles to see them displayed below the Add Principals field, each with a brief description of its permissions.
To assign your desired role(s) to the service account, click the Add button to the right of the Add Principals field.
Command Line
Now that you have your project and service account information, you need to update the access permissions for your Cloud Storage bucket. These steps use the same variable names used in the previous section.
Set the name of your bucket in an environment variable named
BUCKET_NAME
:BUCKET_NAME="your_bucket_name"
Grant the service account read access to objects in your Cloud Storage bucket:
gcloud storage buckets add-iam-policy-binding gs://$BUCKET_NAME --member=user:$SVC_ACCOUNT --role=roles/storage.legacyObjectReader
Grant write access:
gcloud storage buckets add-iam-policy-binding gs://$BUCKET_NAME --member=user:$SVC_ACCOUNT --role=roles/storage.legacyObjectWriter
To choose a role to grant to your AI Platform Prediction service account, see the Cloud Storage IAM roles. For more general information about updating IAM roles in Cloud Storage, see how to grant access to a service account for a resource.
What's next
- Experience AI Platform Prediction in action by working through the getting started guide.
- Learn about how AI Platform works.
- Learn about how AI Platform Prediction works.