The ML.NGRAMS function
This document describes the ML.NGRAMS
function, which lets you create
n-grams of the input values.
Syntax
ML.NGRAMS(array_input, range [, separator])
Arguments
ML.NGRAMS
takes the following arguments:
array_input
: anARRAY<STRING>
value that represent the tokens to be merged.range
: anARRAY
of twoINT64
elements or a singleINT64
value. If you specify anARRAY
value, theINT64
elements provide the range of n-gram sizes to return. Provide the numerical values in order, lower to higher. If you specify a singleINT64
value of x, the range of n-gram sizes to return is[x, x]
.separator
: aSTRING
value that specifies the separator to connect two adjacent tokens in the output. The default value is whitespace.
Output
ML.NGRAMS
returns an ARRAY<STRING>
value that contain the n-grams.
Example
The following example outputs all possible 2-token and 3-token combinations for a set of three input strings:
SELECT ML.NGRAMS(['a', 'b', 'c'], [2,3], '#') AS output;
The output looks similar to the following:
+-----------------------+ | output | +-----------------------+ | ["a#b","a#b#c","b#c"] | +-----------------------+
What's next
- For information about feature preprocessing, see Feature preprocessing overview.
- For information about the supported SQL statements and functions for each model type, see End-to-end user journey for each model.