Many applications query a database to populate a single page in their applications. In such applications, the application doesn't need all of the matches, but only the top-k matches based on index sort order. Search indexes can implement this type of search very efficiently. This page describes how to create and search an index that has top-k matching.
Create search indexes for top-k matches
To configure a search index for top-k matching, use ORDER BY
to order the
search index by a specific column. Queries need to have an ORDER BY
clause
that exactly matches the search index sort order (including ascending versus
descending direction) and a LIMIT
clause that requests the query to stop after
finding k-matching rows.
You can also implement pagination using these clauses. For more information, see Paginate search queries.
For some use cases, it might make sense to maintain multiple search indexes sorted by different columns. Like partitioning, it's a trade-off between storage and write cost versus query latency.
For example, consider a table that uses the following schema:
CREATE TABLE Albums (
AlbumId STRING(MAX) NOT NULL,
RecordTimestamp INT64 NOT NULL,
ReleaseTimestamp INT64 NOT NULL,
ListenTimestamp INT64 NOT NULL,
AlbumTitle STRING(MAX),
AlbumTitle_Tokens TOKENLIST AS (TOKENIZE_FULLTEXT(AlbumTitle)) HIDDEN
) PRIMARY KEY(AlbumId);
CREATE SEARCH INDEX AlbumsRecordTimestampIndex
ON Albums(AlbumTitle_Tokens, SingerId_Tokens)
ORDER BY RecordTimestamp DESC
STORING ListenTimestamp
CREATE SEARCH INDEX AlbumsReleaseTimestampIndex
ON Albums(AlbumTitle_Tokens)
ORDER BY ReleaseTimestamp DESC
STORING ListenTimestamp
Query search indexes for top-k matches
As stated previously, queries need to have an ORDER BY
clause
that exactly matches the search index sort order (including ascending versus
descending direction) and a LIMIT
clause that requests the query to stop after
finding k-matching rows.
The following are some common queries:
The following query is very efficient. It selects the
AlbumsRecordTimestampIndex
index. Even if there are many albums with the word "happy", the query only scans a small number of rows:SELECT AlbumId FROM Albums WHERE SEARCH(AlbumTitle_Tokens, 'happy') ORDER BY RecordTimestamp DESC LIMIT 10
The same query, requesting sort order by
ReleaseTimestamp
in descending order, uses theAlbumsReleaseTimestampIndex
index and is equally efficient:SELECT AlbumId FROM Albums WHERE SEARCH(AlbumTitle_Tokens, 'happy') ORDER BY ReleaseTimestamp DESC LIMIT 10
At the same time, a query requesting sort order by
ListenTimestamp
doesn't execute a top-k query efficiently. It has to fetch all matching albums, sort them byListenTimestamp,
and return the top 10. Such a query uses more resources if there's a large number of documents that contain the term "happy".SELECT AlbumId FROM Albums WHERE SEARCH(AlbumTitle_Tokens, 'happy') ORDER BY ListenTimestamp DESC LIMIT 10`
Similarly, a query doesn't run efficiently if it requests that results are ordered using the
RecordTimestamp
column in ascending order. It scans all rows with the word "happy", despite having aLIMIT
.SELECT AlbumId FROM Albums WHERE SEARCH(AlbumTitle_Tokens, 'happy') ORDER BY RecordTimestamp ASC LIMIT 10
What's next
- Learn about full-text search queries.
- Learn how to rank search results.
- Learn how to paginate search results.
- Learn how to mix full-text and non-text queries.
- Learn how to search multiple columns.