Get an online prediction

The Online Prediction service of Vertex AI lets you make synchronous requests to your own prediction model endpoint.

This page shows you how to send requests to your model so that it can serve online predictions with low latency.

Before you begin

Before you can start using the Online Prediction API, you must have a project and appropriate credentials.

Follow these steps before getting an online prediction:

Set up a project for Vertex AI.
To get the permissions that you need to access Online Prediction, ask your Project IAM Admin to grant you the Vertex AI Prediction User (vertex-ai-prediction-user) role.

For information about this role, see Prepare IAM permissions.
Create and train a prediction model targeting one of the supported containers.
Create the prediction cluster and ensure your project allows incoming external traffic.
Export your model artifacts for prediction.
Deploy your model to an endpoint.
Show details of the Endpoint custom resource of your prediction model:
```
kubectl --kubeconfig PREDICTION_CLUSTER_KUBECONFIG get endpoint PREDICTION_ENDPOINT -n PROJECT_NAMESPACE -o jsonpath='{.status.endpointFQDN}'
```
Replace the following:
- PREDICTION_CLUSTER_KUBECONFIG: the path to the kubeconfig file in the prediction cluster.
- PREDICTION_ENDPOINT: the name of the endpoint.
- PROJECT_NAMESPACE: the name of the prediction project namespace.
The output must show the status field, displaying the endpoint fully-qualified domain name on the endpointFQDN field. Register this endpoint URL path to use it for your requests.

Set your environment variables

If you want to send a request to your model endpoint using a Python script and you set up a service account in your project to make authorized API calls programmatically, you can define environment variables in the script to access values such as the service account keys when running.

Follow these steps to set required environment variables on a Python script:

Create a JupyterLab notebook to interact with the Online Prediction API.
Create a Python script on the JupyterLab notebook.
Add the following code to the Python script:
```
import os

os.environ["GOOGLE_APPLICATION_CREDENTIALS"] = "APPLICATION_DEFAULT_CREDENTIALS_FILENAME"
```
Replace APPLICATION_DEFAULT_CREDENTIALS_FILENAME with the name of the JSON file that contains the service account keys you created in the project, such as my-service-key.json.
Save the Python script with a name, such as prediction.py.
Run the Python script to set the environment variables:
```
python SCRIPT_NAME
```
Replace SCRIPT_NAME with the name you gave to your Python script, such as prediction.py.

Send a request to an endpoint

Make a request to the model's endpoint to get an online prediction:

curl

Follow these steps to make a curl request:

Create a JSON file named request.json for your request body.

You must add and format your input for online prediction with the request body details that the target container requires.
Get an authentication token.

Make the request:

curl -X POST -H "Content-Type: application/json; charset=utf-8" -H "Authorization: Bearer TOKEN"
https://ENDPOINT:443/v1/model:predict -d @request.json

Replace the following:

TOKEN: the authentication token you obtained.
ENDPOINT: your model endpoint for the online prediction request.

If successful, you receive a JSON response to your online prediction request.

The following output shows an example:

{
    "predictions": [[-357.10849], [-171.621658]
    ]
}

For more information about responses, see Response body details.

Python

Follow these steps to use the Online Prediction service from a Python script:

Create a JSON file named request.json for your request body.

You must add and format your input for online prediction with the request body details that the target container requires.
Install the latest version of the Vertex AI Platform client library.
Set the required environment variables on a Python script.
Authenticate your API request.

Add the following code to the Python script you created:

import json
import os
from typing import Sequence

import grpc
from absl import app
from absl import flags

from google.auth.transport import requests
from google.protobuf import json_format
from google.protobuf.struct_pb2 import Value
from google.cloud.aiplatform_v1.services import prediction_service

_INPUT = flags.DEFINE_string("input", None, "input", required=True)
_HOST = flags.DEFINE_string("host", None, "Prediction endpoint", required=True)
_ENDPOINT_ID = flags.DEFINE_string("endpoint_id", None, "endpoint id", required=True)

os.environ["GRPC_DEFAULT_SSL_ROOTS_FILE_PATH"] = "path-to-ca-cert-file.cert"

# ENDPOINT_RESOURCE_NAME is a placeholder value that doesn't affect prediction behavior.
ENDPOINT_RESOURCE_NAME="projects/000000000000/locations/us-central1/endpoints/00000000000000"

def get_sts_token(host):
  creds = None
  try:
    creds, _ = google.auth.default()
    creds = creds.with_gdch_audience(host+":443")
    req = requests.Request()
    creds.refresh(req)
    print("Got token: ")
    print(creds.token)
  except Exception as e:
    print("Caught exception" + str(e))
    raise e
return creds.token

# predict_client_secure builds a client that requires TLS
def predict_client_secure(host, token):
  with open(os.environ["GRPC_DEFAULT_SSL_ROOTS_FILE_PATH"], 'rb') as f:
    channel_creds = grpc.ssl_channel_credentials(f.read())

  call_creds = grpc.access_token_call_credentials(token)

  creds = grpc.composite_channel_credentials(
    channel_creds,
    call_creds,
  )

  client = prediction_service.PredictionServiceClient(
      transport=prediction_service.transports.grpc.PredictionServiceGrpcTransport(
       channel=grpc.secure_channel(target=host+":443", credentials=creds)))

  return client

def predict_func(client, instances):
  resp = client.predict(
    endpoint=ENDPOINT_RESOURCE_NAME,
    instances=instances,
    metadata=[("x-vertex-ai-endpoint-id", _ENDPOINT_ID.value)]
  )
  print(resp)

def main(argv: Sequence[str]):
  del argv  # Unused.
  with open(_INPUT.value) as json_file:
      data = json.load(json_file)
      instances = [json_format.ParseDict(s, Value()) for s in data["instances"]]

  token = get_sts_token(_HOST.value)
  client = predict_client_secure(_HOST.value, token)
  predict_func(client=client, instances=instances)

if __name__=="__main__":
  app.run(main)

Save the Python script with a name, such as prediction.py.
Make the request to the prediction server:
```
python SCRIPT_NAME --input request.json \
    --host ENDPOINT \
    --endpoint_id ENDPOINT_ID \
```
Replace the following:
- SCRIPT_NAME: the name of the Python script, such as prediction.py.
- ENDPOINT: your model endpoint for the online prediction request.
- ENDPOINT_ID: the value of the endpoint ID.

If successful, you receive a JSON response to your online prediction request. For more information about responses, see Response body details.