Generative models

This page lists Gemini models, self-deployed models, and models with managed APIs on Vertex AI that support RAG.

Gemini models

The following table lists the Gemini models and their versions that support Vertex AI RAG Engine:

Model	Version
Gemini 2.0 Flash	`gemini-2.0-flash-001`
Gemini 2.0 Flash-Lite	`gemini-2.0-flash-lite`
Gemini 2.0 Pro (Experimental)	`gemini-2.0-pro-exp-02-05`
Gemini 1.5 Flash	`gemini-1.5-flash-002` `gemini-1.5-flash-001`
Gemini 1.5 Pro	`gemini-1.5-pro-002` `gemini-1.5-pro-001`
Gemini 1.0 Pro	`gemini-1.0-pro-001` `gemini-1.0-pro-002`
Gemini 1.0 Pro Vision	`gemini-1.0-pro-vision-001`
Gemini	`gemini-experimental`

Self-deployed models

Vertex AI RAG Engine supports all models in Model Garden.

Use Vertex AI RAG Engine with your self-deployed open model endpoints.

Replace the variables used in the code sample:

PROJECT_ID: Your project ID.
LOCATION: The region to process your request.

ENDPOINT_ID: Your endpoint ID.

  # Create a model instance with your self-deployed open model endpoint
  rag_model = GenerativeModel(
      "projects/PROJECT_ID/locations/LOCATION/endpoints/ENDPOINT_ID",
      tools=[rag_retrieval_tool]
  )

Models with managed APIs on Vertex AI

The models with managed APIs on Vertex AI that support Vertex AI RAG Engine include the following:

The following code sample demonstrates how to use the Gemini GenerateContent API to create a generative model instance. The model ID, /publisher/meta/models/llama-3.1-405B-instruct-maas, is found in the model card.

Replace the variables used in the code sample:

PROJECT_ID: Your project ID.
LOCATION: The region to process your request.

RAG_RETRIEVAL_TOOL: Your RAG retrieval tool.

  # Create a model instance with Llama 3.1 MaaS endpoint
  rag_model = GenerativeModel(
      "projects/PROJECT_ID/locations/LOCATION/publisher/meta/models/llama-3.1-405B-instruct-maas",
      tools=RAG_RETRIEVAL_TOOL
  )

The following code sample demonstrates how to use the OpenAI compatible ChatCompletions API to generate a model response.

Replace the variables used in the code sample:

PROJECT_ID: Your project ID.
LOCATION: The region to process your request.
MODEL_ID: LLM model for content generation. For example, meta/llama-3.1-405b-instruct-maas.
INPUT_PROMPT: The text sent to the LLM for content generation. Use a prompt relevant to the documents in Vertex AI Search.
RAG_CORPUS_ID: The ID of the RAG corpus resource.
ROLE: Your role.
USER: Your username.

CONTENT: Your content.

  # Generate a response with Llama 3.1 MaaS endpoint
  response = client.chat.completions.create(
      model="MODEL_ID",
      messages=[{"ROLE": "USER", "content": "CONTENT"}],
      extra_body={
          "extra_body": {
              "google": {
                  "vertex_rag_store": {
                      "rag_resources": {
                          "rag_corpus": "RAG_CORPUS_ID"
                      },
                      "similarity_top_k": 10
                  }
              }
          }
      },
  )

What's next

Use Embedding models with Vertex AI RAG Engine