Generative models

This page lists Gemini models, self-deployed models, and models with managed APIs on Vertex AI that support RAG.

Gemini models

The following table lists the Gemini models and their versions that support Vertex AI RAG Engine:

Model Version
Gemini 1.5 Flash gemini-1.5-flash-002
gemini-1.5-flash-001
Gemini 1.5 Pro gemini-1.5-pro-002
gemini-1.5-pro-001
Gemini 1.0 Pro gemini-1.0-pro-001
gemini-1.0-pro-002
Gemini 1.0 Pro Vision gemini-1.0-pro-vision-001
Gemini gemini-experimental

Self-deployed models

Vertex AI RAG Engine supports all models in Model Garden.

Use Vertex AI RAG Engine with your self-deployed open model endpoints.

Replace the variables used in the code sample:

  • PROJECT_ID: Your project ID.
  • LOCATION: The region to process your request.
  • ENDPOINT_ID: Your endpoint ID.

      # Create a model instance with your self-deployed open model endpoint
      rag_model = GenerativeModel(
          "projects/PROJECT_ID/locations/LOCATION/endpoints/ENDPOINT_ID",
          tools=[rag_retrieval_tool]
      )
    

Models with managed APIs on Vertex AI

The models with managed APIs on Vertex AI that support Vertex AI RAG Engine include the following:

The following code sample demonstrates how to use the Gemini GenerateContent API to create a generative model instance. The model ID, /publisher/meta/models/llama-3.1-405B-instruct-maas, is found in the model card.

Replace the variables used in the code sample:

  • PROJECT_ID: Your project ID.
  • LOCATION: The region to process your request.
  • RAG_RETRIEVAL_TOOL: Your RAG retrieval tool.

      # Create a model instance with Llama 3.1 MaaS endpoint
      rag_model = GenerativeModel(
          "projects/PROJECT_ID/locations/LOCATION/publisher/meta/models/llama-3.1-405B-instruct-maas",
          tools=RAG_RETRIEVAL_TOOL
      )
    

The following code sample demonstrates how to use the OpenAI compatible ChatCompletions API to generate a model response.

Replace the variables used in the code sample:

  • PROJECT_ID: Your project ID.
  • LOCATION: The region to process your request.
  • MODEL_ID: LLM model for content generation. For example, meta/llama-3.1-405b-instruct-maas.
  • INPUT_PROMPT: The text sent to the LLM for content generation. Use a prompt relevant to the documents in Vertex AI Search.
  • RAG_CORPUS_ID: The ID of the RAG corpus resource.
  • ROLE: Your role.
  • USER: Your username.
  • CONTENT: Your content.

      # Generate a response with Llama 3.1 MaaS endpoint
      response = client.chat.completions.create(
          model="MODEL_ID",
          messages=[{"ROLE": "USER", "content": "CONTENT"}],
          extra_body={
              "extra_body": {
                  "google": {
                      "vertex_rag_store": {
                          "rag_resources": {
                              "rag_corpus": "RAG_CORPUS_ID"
                          },
                          "similarity_top_k": 10
                      }
                  }
              }
          },
      )
    

What's next