Introduction to vector search

This document provides an overview of vector search in BigQuery. Vector search lets you search embeddings to identify semantically similar entities.

Embeddings are high-dimensional numerical vectors that represent a given entity, like a piece of text or an audio file. Machine learning (ML) models use embeddings to encode semantics about such entities to make it easier to reason about and compare them. For example, a common operation in clustering, classification, and recommendation models is to measure the distance between vectors in an embedding space to find items that are most semantically similar.

To perform a vector search, you use the VECTOR_SEARCH function and optionally a vector index. When a vector index is used, VECTOR_SEARCH uses the Approximate Nearest Neighbor search technique to help improve vector search performance, with the trade-off of reducing recall and so returning more approximate results. Brute force is used to return exact results when a vector index isn't available, and you can choose to use brute force to get exact results even when a vector index is available.

Use cases

The combination of embedding generation and vector search enables many interesting use cases, with retrieval-augmented generation (RAG) being the canonical one. Some other possible use cases are as follows:

Given a batch of new support cases, find several similar resolved cases for each. Pass information about the resolved cases to a large language model (LLM) to use as context when summarizing and suggesting resolutions for the new support cases.
Given an audit log entry, find the most closely matching entries in the past 30 days.
Generate embeddings from patient profile data, then use vector search to find patients with similar profiles in order to explore successful treatment plans prescribed to that patient cohort.
Given the embeddings representing pre-accident moments from all the sensors and cameras in a fleet of school buses, find similar moments from all other vehicles in the fleet for further analysis, tuning, and retraining of the models that govern the safety feature engagements.
Given a picture, find the most closely-related images in a BigQuery object table, and pass those images to a model to generate captions.

Pricing

The CREATE VECTOR INDEX statement and the VECTOR_SEARCH function use BigQuery compute pricing. For the CREATE VECTOR INDEX statement, only the indexed column is considered in the bytes processed.

There is no charge for the processing required to build and refresh your vector indexes when the total size of indexed table data is below your per-organization limit. To support indexing beyond this limit, you must provide your own reservation for handling the index management jobs. Vector indexes incur storage costs when they are active. You can find the index storage size in the INFORMATION_SCHEMA.VECTOR_INDEXES view. If the vector index is not yet at 100% coverage, you are still charged for all index storage that is reported in the INFORMATION_SCHEMA.VECTOR_INDEXES view.

Quotas and limits

For more information, see Vector index limits.

Limitations

Queries that contain the VECTOR_SEARCH function aren't accelerated by BigQuery BI Engine.
BigQuery data security and governance rules apply to the use of VECTOR_SEARCH. For more information, see the Limitations section in VECTOR_SEARCH. These rules don't apply to vector index generation.

What's next

Learn more about creating a vector index.
Try the Search embeddings with vector search tutorial to learn how to create a vector index, and then do a vector search for embeddings both with and without the index.
Try the Perform semantic search and retrieval-augmented generation tutorial to learn how to do the following tasks:
- Generate text embeddings.
- Create a vector index on the embeddings.
- Perform a vector search with the embeddings to search for similar text.
- Perform retrieval-augmented generation (RAG) by using vector search results to augment the prompt input and improve results.
Try the Parse PDFs in a retrieval-augmented generation pipeline tutorial to learn how to create a RAG pipeline based on parsed PDF content.