Numeric search indexes

In addition to indexing text, the search index provides an efficient way to index numbers. It's primarily used to augment full-text search queries with conditions on numeric fields. This page describes indexing numbers for equality and inequality queries, and indexing an array of numbers.

Tokenize numbers

You use the TOKENIZE_NUMBER function to create a numeric index. For details, see TOKENIZE_NUMBER.

Index numbers for equality and inequality queries

Spanner supports indexing numbers for equality and inequality. Equality searches match a number. Range and inequality searches match a number within a specific range. You set this value in the TOKENIZE_NUMBER comparison_type parameter:

  • Equality: comparison_type=>"equality"
  • Inequality and equality: comparison_type=>"all"

In both cases, the original number (either integer or floating point) undergoes a process of tokenization, which is conceptually similar to full-text tokenization. It produces a set of tokens that the query can then use to locate documents matching the number condition.

Equality indexing only produces one token, which represents the number. This mode is recommended if queries only have conditions in the form of field = @p in the WHERE clause.

Inequality and equality indexing can accelerate a wider range of conditions in the WHERE clause of the query. This includes field < @p, field <= @p, field > @p, field >= @p, field BETWEEN @p1 and @p2 and field <> @p in addition to equality conditions. To implement this type of indexing, Spanner produces tokens in the underlying search index. Spanner can produce many tokens for each indexed number, depending upon tuning parameters. The number of tokens depends on the parameters that are set for TOKENIZE_NUMBER, such as algorithm, min, max and granularity. It's therefore important to evaluate the tuning parameters carefully to ensure an appropriate balance between disk storage and lookup time.

Array tokenization

In addition to scalar values, TOKENIZE_NUMBER supports tokenization of an array of numbers.

When TOKENIZE_NUMBER is used with the ARRAY column, you must specify comparison_type=>"equality". Range queries aren't supported with an array of numbers.

For example, consider the following schema:

CREATE TABLE Albums (
  AlbumId STRING(MAX) NOT NULL,
  Ratings ARRAY<INT64>,
  Ratings_Tokens TOKENLIST
    AS (TOKENIZE_NUMBER(Ratings, comparison_type=>"equality")) HIDDEN
) PRIMARY KEY(AlbumId);

CREATE SEARCH INDEX AlbumsIndex ON Albums(Ratings_Tokens);

The following query finds all albums that have a rating of 1 or 2:

SELECT AlbumId
FROM Albums
WHERE ARRAY_INCLUDES_ANY(Ratings, [1, 2])

The following query finds all albums that were rated as 1 and as 5:

SELECT AlbumId
FROM Albums
WHERE ARRAY_INCLUDES_ALL(Ratings, [1, 5])

What's next