In addition to indexing text, the search index provides an efficient way to index numbers. It's primarily used to augment full-text search queries with conditions on numeric fields. This page describes indexing numbers for equality and inequality queries, and indexing an array of numbers.
Tokenize numbers
You use the TOKENIZE_NUMBER
function to create a numeric index. For details,
see TOKENIZE_NUMBER
.
Index numbers for equality and inequality queries
Spanner supports indexing numbers for equality and inequality.
Equality searches match a number. Range and inequality searches match a number
within a specific range. You set this value in the
TOKENIZE_NUMBER
comparison_type
parameter:
- Equality:
comparison_type=>"equality"
- Inequality and equality:
comparison_type=>"all"
In both cases, the original number (either integer or floating point) undergoes a process of tokenization, which is conceptually similar to full-text tokenization. It produces a set of tokens that the query can then use to locate documents matching the number condition.
Equality indexing only produces one token, which represents the number. This
mode is recommended if queries only have conditions in the form of field = @p
in the WHERE
clause.
Inequality and equality indexing can accelerate a wider range of conditions in
the WHERE
clause of the query. This includes field < @p
, field <= @p
,
field > @p
, field >= @p
, field BETWEEN @p1 and @p2
and field <> @p
in
addition to equality conditions. To implement this type of indexing,
Spanner produces tokens in the underlying search index.
Spanner can produce many tokens for each indexed number,
depending upon tuning parameters. The number of tokens depends on the parameters
that are set for TOKENIZE_NUMBER
, such as algorithm
, min
, max
and
granularity
. It's therefore important to evaluate the tuning parameters
carefully to ensure an appropriate balance between disk storage and lookup time.
Array tokenization
In addition to scalar values, TOKENIZE_NUMBER
supports tokenization of an
array of numbers.
When TOKENIZE_NUMBER
is used with the ARRAY
column, you must
specify comparison_type=>"equality"
. Range queries aren't supported with an
array of numbers.
For example, consider the following schema:
CREATE TABLE Albums (
AlbumId STRING(MAX) NOT NULL,
Ratings ARRAY<INT64>,
Ratings_Tokens TOKENLIST
AS (TOKENIZE_NUMBER(Ratings, comparison_type=>"equality")) HIDDEN
) PRIMARY KEY(AlbumId);
CREATE SEARCH INDEX AlbumsIndex ON Albums(Ratings_Tokens);
The following query finds all albums that have a rating of 1 or 2:
SELECT AlbumId
FROM Albums
WHERE ARRAY_INCLUDES_ANY(Ratings, [1, 2])
The following query finds all albums that were rated as 1 and as 5:
SELECT AlbumId
FROM Albums
WHERE ARRAY_INCLUDES_ALL(Ratings, [1, 5])
What's next
- Learn about tokenization and Spanner tokenizers.
- Learn about search indexes.
- Learn about index partitioning.