The Online Prediction service of Vertex AI lets you make synchronous requests to your own prediction model endpoint.
This page shows you how to send requests to your model so that it can serve online predictions with low latency.
Before you begin
Before you can start using the Online Prediction API, you must have a project and appropriate credentials.
Follow these steps before getting an online prediction:
- Set up a project for Vertex AI.
To get the permissions that you need to access Online Prediction, ask your Project IAM Admin to grant you the Vertex AI Prediction User (
vertex-ai-prediction-user
) role.For information about this role, see Prepare IAM permissions.
Create and train a prediction model targeting one of the supported containers.
Create the prediction cluster and ensure your project allows incoming external traffic.
Show details of the
Endpoint
custom resource of your prediction model:kubectl --kubeconfig PREDICTION_CLUSTER_KUBECONFIG get endpoint PREDICTION_ENDPOINT -n PROJECT_NAMESPACE -o jsonpath='{.status.endpointFQDN}'
Replace the following:
PREDICTION_CLUSTER_KUBECONFIG
: the path to the kubeconfig file in the prediction cluster.PREDICTION_ENDPOINT
: the name of the endpoint.PROJECT_NAMESPACE
: the name of the prediction project namespace.
The output must show the
status
field, displaying the endpoint fully-qualified domain name on theendpointFQDN
field. Register this endpoint URL path to use it for your requests.
Set your environment variables
If you want to send a request to your model endpoint using a Python script and you set up a service account in your project to make authorized API calls programmatically, you can define environment variables in the script to access values such as the service account keys when running.
Follow these steps to set required environment variables on a Python script:
Create a JupyterLab notebook to interact with the Online Prediction API.
Create a Python script on the JupyterLab notebook.
Add the following code to the Python script:
import os os.environ["GOOGLE_APPLICATION_CREDENTIALS"] = "APPLICATION_DEFAULT_CREDENTIALS_FILENAME"
Replace
APPLICATION_DEFAULT_CREDENTIALS_FILENAME
with the name of the JSON file that contains the service account keys you created in the project, such asmy-service-key.json
.Save the Python script with a name, such as
prediction.py
.Run the Python script to set the environment variables:
python SCRIPT_NAME
Replace
SCRIPT_NAME
with the name you gave to your Python script, such asprediction.py
.
Send a request to an endpoint
Make a request to the model's endpoint to get an online prediction:
curl
Follow these steps to make a curl
request:
Create a JSON file named
request.json
for your request body.You must add and format your input for online prediction with the request body details that the target container requires.
Make the request:
curl -X POST -H "Content-Type: application/json; charset=utf-8" -H "Authorization: Bearer TOKEN" https://ENDPOINT:443/v1/model:predict -d @request.json
Replace the following:
TOKEN
: the authentication token you obtained.ENDPOINT
: your model endpoint for the online prediction request.
If successful, you receive a JSON response to your online prediction request.
The following output shows an example:
{
"predictions": [[-357.10849], [-171.621658]
]
}
For more information about responses, see Response body details.
Python
Follow these steps to use the Online Prediction service from a Python script:
Create a JSON file named
request.json
for your request body.You must add and format your input for online prediction with the request body details that the target container requires.
Install the latest version of the Vertex AI Platform client library.
Add the following code to the Python script you created:
import json import os from typing import Sequence import grpc from absl import app from absl import flags from google.auth.transport import requests from google.protobuf import json_format from google.protobuf.struct_pb2 import Value from google.cloud.aiplatform_v1.services import prediction_service _INPUT = flags.DEFINE_string("input", None, "input", required=True) _HOST = flags.DEFINE_string("host", None, "Prediction endpoint", required=True) _ENDPOINT_ID = flags.DEFINE_string("endpoint_id", None, "endpoint id", required=True) os.environ["GRPC_DEFAULT_SSL_ROOTS_FILE_PATH"] = "path-to-ca-cert-file.cert" # ENDPOINT_RESOURCE_NAME is a placeholder value that doesn't affect prediction behavior. ENDPOINT_RESOURCE_NAME="projects/000000000000/locations/us-central1/endpoints/00000000000000" def get_sts_token(host): creds = None try: creds, _ = google.auth.default() creds = creds.with_gdch_audience(host+":443") req = requests.Request() creds.refresh(req) print("Got token: ") print(creds.token) except Exception as e: print("Caught exception" + str(e)) raise e return creds.token # predict_client_secure builds a client that requires TLS def predict_client_secure(host, token): with open(os.environ["GRPC_DEFAULT_SSL_ROOTS_FILE_PATH"], 'rb') as f: channel_creds = grpc.ssl_channel_credentials(f.read()) call_creds = grpc.access_token_call_credentials(token) creds = grpc.composite_channel_credentials( channel_creds, call_creds, ) client = prediction_service.PredictionServiceClient( transport=prediction_service.transports.grpc.PredictionServiceGrpcTransport( channel=grpc.secure_channel(target=host+":443", credentials=creds))) return client def predict_func(client, instances): resp = client.predict( endpoint=ENDPOINT_RESOURCE_NAME, instances=instances, metadata=[("x-vertex-ai-endpoint-id", _ENDPOINT_ID.value)] ) print(resp) def main(argv: Sequence[str]): del argv # Unused. with open(_INPUT.value) as json_file: data = json.load(json_file) instances = [json_format.ParseDict(s, Value()) for s in data["instances"]] token = get_sts_token(_HOST.value) client = predict_client_secure(_HOST.value, token) predict_func(client=client, instances=instances) if __name__=="__main__": app.run(main)
Save the Python script with a name, such as
prediction.py
.Make the request to the prediction server:
python SCRIPT_NAME --input request.json \ --host ENDPOINT \ --endpoint_id ENDPOINT_ID \
Replace the following:
SCRIPT_NAME
: the name of the Python script, such asprediction.py
.ENDPOINT
: your model endpoint for the online prediction request.ENDPOINT_ID
: the value of the endpoint ID.
If successful, you receive a JSON response to your online prediction request. For more information about responses, see Response body details.