Try Optical Character Recognition (OCR)

This quickstart guides the Application Operator (AO) through the process of using the Vertex AI Optical Character Recognition (OCR) pre-trained API on Google Distributed Cloud (GDC) air-gapped.

Before you begin

Follow these steps before trying OCR:

Set up a project using the GDC console to group the Vertex AI services. For information about creating and using projects, see Create a project.
Ask your Project IAM Admin to grant you the AI OCR Developer (ai-ocr-developer) role in your project namespace.
Enable the OCR pre-trained API.
Download the gdcloud command-line interface (CLI).

Set up your service account

Set up your service account with the name of your service account, project ID, and service key. Replace the PROJECT_ID with your project.

  ${HOME}/gdcloud init  # set URI and project

  ${HOME}/gdcloud auth login

  ${HOME}/gdcloud iam service-accounts create SERVICE_ACCOUNT  --project=PROJECT_ID

  ${HOME}/gdcloud iam service-accounts keys create "SERVICE_KEY".json --project=PROJECT_ID --iam-account=SERVICE_ACCOUNT

Grant access to project resources

Grant access to the Translation API service account by providing your project ID, name of your service account, and the role ai-ocr-developer.

  ${HOME}/gdcloud iam service-accounts add-iam-policy-binding --project=PROJECT_ID --iam-account=SERVICE_ACCOUNT --role=role/ai-ocr-developer

Set your environment variables

Before running the OCR pre-trained service, set your environment variable.

  export GOOGLE_APPLICATION_CREDENTIALS="SERVICE_KEY".json

Authenticate the request

You must get a token to authenticate the requests to the OCR pre-trained service. Follow these steps:

gdcloud CLI

Export the identity token for the specified account to an environment variable:

export TOKEN="$($HOME/gdcloud auth print-identity-token --audiences=https://ENDPOINT)"

Replace ENDPOINT with the OCR endpoint. For more information, view service statuses and endpoints.

Python

Install the google-auth client library.
```
pip install google-auth
```

Save the following code to a Python script, and update the ENDPOINT to the OCR endpoint. For more information, see View service statuses and endpoints.

import google.auth
from google.auth.transport import requests

api_endpoint = "https://ENDPOINT"

creds, project_id = google.auth.default()
creds = creds.with_gdch_audience(api_endpoint)

def test_get_token():
  req = requests.Request()
  creds.refresh(req)
  print(creds.token)

if __name__=="__main__":
  test_get_token()

Run the script to fetch the token.

You must add the fetched token to the header of the curl requests as in the following example:

-H "Authorization: Bearer TOKEN"

Make the curl request:

curl

echo '{"requests": [{"image": {"content": "'iVBORw0KGgoAAAANSUhEUgAAAMgAAAArCAMAAAAKVjeAAAAAA3NCSVQICAjb4U/gAAAADFBMVEX///8AAABnZ2cMDAzMh6MLAAAAX3pUWHRSYXcgcHJvZmlsZSB0eXBlIEFQUDEAAAiZ40pPzUstykxWKCjKT8vMSeVSAANjEy4TSxNLo0QDAwMLAwgwNDAwNgSSRkC2OVQo0QAFmJibpQGhuVmymSmIzwUAT7oVaBst2IwAAAEjSURBVGiB7ZRBFsMgCEShvf+d+9o0VmAwxpCuZjZGkYGfaEQoiqIoiqIoiqKoG6Sqg6lbTqK1LfwWTpUjSJ0IMnIhyAXdDaL6mwSQPpg5hgeT9H7c5sG1FES/wiA2OgkSLUPfW7wSRNWUdSAuih19drTUFnCuiyBO+6ob7WBGTPJ5tZYDJ4NAJYgvEoesUgoC+8bntgikczALSXQGJLMcuj7nOfAduQbStkm3fQnkUQACP9EZkB3mCsgZ3QEiDkRQ0r9A4K55kHaswlUmyApIVsVH04oGxO1NSoDfbw2IujmI5hX7fNeeDkDaWAbSX/cIIjY4B+KTAoj5xaDelkAEWobooW2/xyZFkH0DTF4GsZ84HIejg4x7UWuAnlSzZIqiJvUCFxYEUadKypwAAAAASUVORK5CYII='" }, "features": [ { "type": "DOCUMENT_TEXT_DETECTION" } ] }] }' | curl --cacert CERTIFICATE_NAME --data-binary @- -H "Content-Type: application/json" -H "Authorization: Bearer TOKEN" -H "x-goog-user-project: projects/PROJECT_ID" https://ENDPOINT/v1/images:annotate

Run the OCR pre-trained API sample script

This example shows you how to interact with an OCR pre-trained API.

Check whether the client library for OCR is installed.
```
  pip freeze | grep vision
  # output example: google-cloud-vision==3.0.0
```
If the existing version doesn't match the client library in https://CONSOLE_ENDPOINT/.well-known/static/client-libraries, uninstall the client library.
```
  pip uninstall google-cloud-vision
```
Specify the console endpoint and the client library for OCR (provided in the example).
```
   wget https://CONSOLE_ENDPOINT/.well-known/static/client-libraries/google-cloud-vision
```
Note: If the error message, "x509: certificate signed by unknown authority", is displayed, your workstation doesn't trust the CA certificate used in Distributed Cloud. Follow your organization's procedure to check the trusted certification store for your workstation.
Warning: Using --login-config-cert with an unverified certificate makes your workstation vulnerable to man-in-the-middle attacks. Ensure that you rely only on your workstation's trust store instead of trusting a CA certificate from unknown sources.
Extract the tar file, and install it using pip. If errors are generated because something isn't found, install any missing dependencies.
```
tar -xvzf CLIENT_LIBRARY

pip install -r FOLDER/requirements.txt --no-index --find-links FOLDER
```
Use the OCR client library script to generate the token, and make requests to the OCR service.

Set up your environment variable.

os.environ["GOOGLE_APPLICATION_CREDENTIALS"] = ""SERVICE_KEY".json"

Run the OCR sample

Replace the ENDPOINT with the OCR endpoint that you use for your organization.

from google.cloud import vision
import google.auth
from google.auth.transport import requests
from google.api_core.client_options import ClientOptions

audience = "https://ENDPOINT:443"
api_endpoint="ENDPOINT:443"

def vision_client(creds):
  opts = ClientOptions(api_endpoint=api_endpoint)
  """Create vision client."""
  return vision.ImageAnnotatorClient(credentials=creds, client_options=opts)

def main():
  creds = None
  try:
    creds, project_id = google.auth.default()
    creds = creds.with_gdch_audience(audience)
    req = requests.Request()
    creds.refresh(req)
    print("Got token: ")
    print(creds.token)
  except Exception as e:
    print("Caught exception" + str(e))
    raise e
  return creds

def vision_func(creds):
  vc = vision_client(creds)
  image = {"content": "iVBORw0KGgoAAAANSUhEUgAAAMgAAAArCAMAAAAKVjeAAAAAA3NCSVQICAjb4U/gAAAADFBMVEX///8AAABnZ2cMDAzMh6MLAAAAX3pUWHRSYXcgcHJvZmlsZSB0eXBlIEFQUDEAAAiZ40pPzUstykxWKCjKT8vMSeVSAANjEy4TSxNLo0QDAwMLAwgwNDAwNgSSRkC2OVQo0QAFmJibpQGhuVmymSmIzwUAT7oVaBst2IwAAAEjSURBVGiB7ZRBFsMgCEShvf+d+9o0VmAwxpCuZjZGkYGfaEQoiqIoiqIoiqKoG6Sqg6lbTqK1LfwWTpUjSJ0IMnIhyAXdDaL6mwSQPpg5hgeT9H7c5sG1FES/wiA2OgkSLUPfW7wSRNWUdSAuih19drTUFnCuiyBO+6ob7WBGTPJ5tZYDJ4NAJYgvEoesUgoC+8bntgikczALSXQGJLMcuj7nOfAduQbStkm3fQnkUQACP9EZkB3mCsgZ3QEiDkRQ0r9A4K55kHaswlUmyApIVsVH04oGxO1NSoDfbw2IujmI5hX7fNeeDkDaWAbSX/cIIjY4B+KTAoj5xaDelkAEWobooW2/xyZFkH0DTF4GsZ84HIejg4x7UWuAnlSzZIqiJvUCFxYEUadKypwAAAAASUVORK5CYII="}
  features = [{"type_": vision.Feature.Type.DOCUMENT_TEXT_DETECTION}]
  # Each requests element corresponds to a single image.  To annotate more
  # images, create a request element for each image and add it to
  # the array of requests
  req = {"image": image, "features": features}

  metadata = [("x-goog-user-project", "projects/PROJECT_ID")]

  resp = vc.annotate_image(req,metadata=metadata)

  print(resp)

if __name__=="__main__":
  creds = main()
  vision_func(creds)

Replace PROJECT_ID with the ID of the project that you want to use.

What's next

Learn more about how to Detect text in images.
Learn more about how to Detect text in images offline.