Query statements scan one or more tables or expressions and return the computed result rows. This topic describes the syntax for SQL queries in GoogleSQL for Bigtable.
SQL syntax notation rules
The GoogleSQL documentation commonly uses the following syntax notation rules:
- Square brackets
[ ]
: Optional clause. - Curly braces with vertical bars
{ a | b | c }
: LogicalOR
. Select one option. - Ellipsis
...
: Preceding item can repeat. - Double quotes
"
: Syntax wrapped in double quotes (""
) is required.
SQL syntax
query_statement: query_expr query_expr: { select | ( query_expr ) } [ ORDER BY expression [ ASC ] ] [ LIMIT count ] select: SELECT select_list [ FROM from_clause[, ...] ] [ WHERE bool_expression ]
SELECT
statement
SELECT select_list select_list: { select_all | select_expression } [, ...] select_all: [ expression. ]* [ EXCEPT ( column_name [, ...] ) ] [ REPLACE ( expression AS column_name [, ...] ) ] select_expression: expression [ [ AS ] alias ]
The SELECT
list defines the columns that the query will return. Expressions in
the SELECT
list can refer to columns in any of the from_item
s in its
corresponding FROM
clause.
Each item in the SELECT
list is one of:
*
expression
expression.*
SELECT *
SELECT *
, often referred to as select star, produces one output column for
each column that is visible after executing the full query.
SELECT * FROM example_table;
/*-------------------------------------------------------------------------------+
| _key | f1 | f2 | f3 |
+-------------------------------------------------------------------------------+
| a#01 | { | { | { |
| | "c1": "xyz", | "col1": "def", | "cl1": "{ |
| | "c2": "nop" | "col2": "125", | "n": "Def",|
| | } | "col3": "2008-12-25 15:30:00+00" | "g": 40, |
| | | } | "k": 3 |
| | | | }" |
| | | | } |
+-------------------------------------------------------------------------------+
| a#02 | { | { | { |
| | "c1": "zyx", | "col1": "cba", | "cl1": "{ |
| | } | "col2": "123", | "n": "Gih",|
| | | "col3": "2018-10-15 15:30:00+00" | "g": 20, |
| | | } | }" |
| | | | } |
+-------------------------------------------------------------------------------+
| a#03 | { | { | { |
| | "c1": "abc", | "col1": "cba", | "cl1": "{ |
| | "c2": "def" | "col2": "543", | "n": "Xyz",|
| | } | "col3": "2020-11-10 13:21:00+00" | "g": 15, |
| | | } | }" |
| | | | } |
+-------------------------------------------------------------------------------+
| b#01 | { | { | NULL |
| | "c1": "jkl", | "col1": "qrs", | |
| | "c2": "mno" | "col2": "654", | |
| | } | "col3": "2021-10-10 14:18:03+07" | |
| | | } | |
+-------------------------------------------------------------------------------+
| b#02 | { | { | NULL |
| | "c1": "hi", | "col1": "tu", | |
| | "c2": "no" | "col3": "2023-10-10 11:28:09+05",| |
| | } | } | |
+-------------------------------------------------------------------------------+
| c#03 | { | { | { |
| | "c1": "j", | "col1": "u", | "cl1": "{ |
| | "c2": "l" | } | "n": "T", |
| | } | } | "g": 22, |
| | | | }" |
| | | | } |
+-------------------------------------------------------------------------------*/
SELECT expression
Items in a SELECT
list can be expressions. These expressions evaluate to a
single value and produce one output column, with an optional explicit alias
.
If the expression doesn't have an explicit alias, it receives an implicit alias according to the rules for implicit aliases, if possible. Otherwise, the column is anonymous and you cannot refer to it by name elsewhere in the query.
In the following example, the first expression has an explicit alias
for f1
and the second expression has an implicit alias for f1
:
SELECT example_table.f1 FROM example_table;
SELECT f1 FROM example_table;
/*-------------------------------------+
| _key | f1 |
+-------------------------------------+
| a#01 | { "c1": "xyz", "c2": "nop" } |
| a#02 | { "c1": "zyx" } |
| a#03 | { "c1": "abc", "c2": "def" } |
| b#01 | { "c1": "jkl", "c2": "mno" } |
| b#02 | { "c1": "hi", "c2": "no" } |
| c#03 | { "c1": "j", "c2": "l" } |
+-------------------------------------*/
SELECT expression.*
An item in a SELECT
list can also take the form of expression.*
. This
produces one output column for each column or top-level field of expression
.
The expression must either be a table alias or evaluate to a single value of a
data type with fields, such as a STRUCT.
SELECT example_table.* FROM example_table;
/*-------------------------------------------------------------------------------+
| _key | f1 | f2 | f3 |
+-------------------------------------------------------------------------------+
| a#01 | { | { | { |
| | "c1": "xyz", | "col1": "def", | "cl1": "{ |
| | "c2": "nop" | "col2": "125", | "n": "Def",|
| | } | "col3": "2008-12-25 15:30:00+00" | "g": 40, |
| | | } | "k": 3 |
| | | | }" |
| | | | } |
+-------------------------------------------------------------------------------+
| a#02 | { | { | { |
| | "c1": "zyx", | "col1": "cba", | "cl1": "{ |
| | } | "col2": "123", | "n": "Gih",|
| | | "col3": "2018-10-15 15:30:00+00" | "g": 20, |
| | | } | }" |
| | | | } |
+-------------------------------------------------------------------------------+
| a#03 | { | { | { |
| | "c1": "abc", | "col1": "cba", | "cl1": "{ |
| | "c2": "def" | "col2": "543", | "n": "Xyz",|
| | } | "col3": "2020-11-10 13:21:00+00" | "g": 15, |
| | | } | }" |
| | | | } |
+-------------------------------------------------------------------------------+
| b#01 | { | { | NULL |
| | "c1": "jkl", | "col1": "qrs", | |
| | "c2": "mno" | "col2": "654", | |
| | } | "col3": "2021-10-10 14:18:03+07" | |
| | | } | |
+-------------------------------------------------------------------------------+
| b#02 | { | { | NULL |
| | "c1": "hi", | "col1": "tu", | |
| | "c2": "no" | "col3": "2023-10-10 11:28:09+05",| |
| | } | } | |
+-------------------------------------------------------------------------------+
| c#03 | { | { | { |
| | "c1": "j", | "col1": "u", | "cl1": "{ |
| | "c2": "l" | } | "n": "T", |
| | } | } | "g": 22, |
| | | | }" |
| | | | } |
+-------------------------------------------------------------------------------*/
SELECT * EXCEPT
A SELECT * EXCEPT
statement specifies the names of one or more columns to
exclude from the result. All matching column names are omitted from the output.
SELECT * EXCEPT (f2) FROM example_table;
/*---------------------------------------------------------------------------------+
| _key | f1 | f3 |
+---------------------------------------------------------------------------------+
| a#01 | { "c1": "xyz", "c2": "nop" } | { "cl1": "{"n": "Def", "g": 40,"k": 3}" } |
| a#02 | { "c1": "zyx" } | { "cl1": "{"n": "Gih", "g": 20}" } |
| a#03 | { "c1": "abc", "c2": "def" } | { "cl1": "{"n": "Xyz", "g": 15}" } |
| b#01 | { "c1": "jkl", "c2": "mno" } | NULL |
| b#02 | { "c1": "hi", "c2": "no" } | NULL |
| c#03 | { "c1": "j", "c2": "l" } | { "cl1": "{"n": "T", "g": 22}" } |
+---------------------------------------------------------------------------------*/
SELECT * REPLACE
A SELECT * REPLACE
statement specifies one or more
expression AS identifier
clauses. Each identifier must match a column name
from the SELECT *
statement. In the output column list, the column that
matches the identifier in a REPLACE
clause is replaced by the expression in
that REPLACE
clause.
A SELECT * REPLACE
statement doesn't change the names or order of columns.
However, it can change the value and the value type.
SELECT * REPLACE (f1 AS f2) FROM sample1;
/*----------------------------------------------------------------------------------------------------------------+
| _key | f1 | f2 | f3 |
+----------------------------------------------------------------------------------------------------------------+
| a#01 | { "c1": "xyz", "c2": "nop" } | { "c1": "xyz", "c2": "nop" } | { "cl1": "{"n": "Def", "g": 40,"k": 3}" } |
| a#02 | { "c1": "zyx" } | { "c1": "zyx" } | { "cl1": "{"n": "Gih", "g": 20}" } |
| a#03 | { "c1": "abc", "c2": "def" } | { "c1": "abc", "c2": "def" } | { "cl1": "{"n": "Xyz", "g": 15}" } |
| b#01 | { "c1": "jkl", "c2": "mno" } | { "c1": "jkl", "c2": "mno" } | NULL |
| b#02 | { "c1": "hi", "c2": "no" } | { "c1": "hi", "c2": "no" } | NULL |
| c#03 | { "c1": "j", "c2": "l" } | { "c1": "j", "c2": "l" } | { "cl1": "{"n": "T", "g": 22}" } |
+----------------------------------------------------------------------------------------------------------------*/
FROM
clause
FROM from_clause[, ...] from_clause: from_item from_item: table_name [ as_alias ] as_alias: [ AS ] alias
The FROM
clause indicates the table or tables from which to retrieve rows,
and specifies how to join those rows together to produce a single stream of
rows for processing in the rest of the query.
table_name
The name of an existing table.
SELECT * FROM Roster;
WHERE
clause
WHERE bool_expression
The WHERE
clause filters the results of the FROM
clause.
Only rows whose bool_expression
evaluates to TRUE
are included. Rows
whose bool_expression
evaluates to NULL
or FALSE
are
discarded.
The evaluation of a query with a WHERE
clause is typically completed in this
order:
FROM
WHERE
ORDER BY
LIMIT
Evaluation order doesn't always match syntax order.
The WHERE
clause only references columns available via the FROM
clause;
it cannot reference SELECT
list aliases.
Examples
This query returns returns all rows from the example_table
table
where f2['col2']
starts with 1
:
SELECT f2 FROM example_table
WHERE STARTS_WITH(CAST(f2['col2'] AS STRING), "1");
/*--------------------------------------------------------------------+
| f2 |
+--------------------------------------------------------------------+
| { "col1": "def", "col2": "125", "col3": "2008-12-25 15:30:00+00" } |
| { "col1": "cba", "col2": "123", "col3": "2018-10-15 15:30:00+00" } |
+--------------------------------------------------------------------*/
The bool_expression
can contain multiple sub-conditions:
SELECT f2 FROM example_table
WHERE
STARTS_WITH(CAST(f2['col1'] AS STRING), "c") OR
STARTS_WITH(CAST(f2['col2'] AS STRING), "1");
/*--------------------------------------------------------------------+
| f2 |
+--------------------------------------------------------------------+
| { "col1": "def", "col2": "125", "col3": "2008-12-25 15:30:00+00" } |
| { "col1": "cba", "col2": "123", "col3": "2018-10-15 15:30:00+00" } |
| { "col1": "cba", "col2": "543", "col3": "2020-11-10 13:21:00+00" } |
+--------------------------------------------------------------------*/
ORDER BY
clause
ORDER BY _key [ ASC ]
The ORDER BY
clause specifies a column or expression as the sort criterion for
the result set. If an ORDER BY
clause is not present, the order of the results
of a query is not defined. Only the
_key
generated column can be used with the ORDER BY
clause.
Optional Clauses
ASC
: Sort the results in ascending order of_key
values.ASC
is the default value.
Examples
Use the default sort order (ascending).
SELECT _key, f1 FROM example_table ORDER BY _key;
SELECT _key, f1 FROM example_table ORDER BY _key ASC;
/*-------------------------------------+
| _key | f1 |
+-------------------------------------+
| a#01 | { "c1": "xyz", "c2": "nop" } |
| a#02 | { "c1": "zyx" } |
| a#03 | { "c1": "abc", "c2": "def" } |
| b#01 | { "c1": "jkl", "c2": "mno" } |
| b#02 | { "c1": "hi", "c2": "no" } |
| c#03 | { "c1": "j", "c2": "l" } |
+-------------------------------------*/
LIMIT
clause
LIMIT count
Limits the number of rows to return in a query.
Definitions
LIMIT
: Limits the number of rows to produce.count
is anINT64
constant expression that represents the non-negative, non-NULL
limit. No more thancount
rows are produced.LIMIT 0
returns 0 rows.
Details
The rows that are returned by LIMIT
have undefined order unless
these clauses are used after ORDER BY
.
A constant expression can be represented by a general expression, literal, or parameter value.
Examples
SELECT f1 FROM example_table LIMIT 2;
/*-------------------------------------+
| _key | f1 |
+-------------------------------------+
| a#01 | { "c1": "xyz", "c2": "nop" } |
| a#02 | { "c1": "zyx" } |
+-------------------------------------*/
Using aliases
An alias is a temporary name given to a table, column, or expression present in
a query. You can introduce explicit aliases in the SELECT
list or FROM
clause, or GoogleSQL will infer an implicit alias for some expressions.
Expressions with neither an explicit nor implicit alias are anonymous and the
query cannot reference them by name.
Explicit aliases
You can introduce explicit aliases in either the FROM
clause or the SELECT
list.
In a FROM
clause, you can introduce explicit aliases for any item, including
tables, arrays, using [AS] alias
. The AS
keyword is optional.
Example:
SELECT bar.f1
FROM example_table AS bar;
You can introduce explicit aliases for any expression in the SELECT
list using
[AS] alias
. The AS
keyword is optional.
Example:
SELECT f1 AS foo
FROM example_table;
Implicit aliases
In the SELECT
list, if there is an expression that doesn't have an explicit
alias, GoogleSQL assigns an implicit alias according to the following
rules. There can be multiple columns with the same alias in the SELECT
list.
- For identifiers, the alias is the identifier. For example,
SELECT abc
impliesAS abc
. - For path expressions, the alias is the last identifier in the path. For
example,
SELECT abc.def.ghi
impliesAS ghi
. - For field access using the "dot" member field access operator, the alias is
the field name. For example,
SELECT (struct_function()).fname
impliesAS fname
.
In all other cases, there is no implicit alias, so the column is anonymous and cannot be referenced by name. The data from that column will still be returned and the displayed query results may have a generated label for that column, but the label cannot be used like an alias.
In a FROM
clause, from_item
s are not required to have an alias. The
following rules apply:
-
If there is an expression that doesn't have an explicit alias,
GoogleSQL assigns an implicit alias in these cases:
-
For identifiers, the alias is the identifier. For example,
FROM abc
impliesAS abc
. -
For path expressions, the alias is the last identifier in the path. For
example,
FROM abc.def.ghi
impliesAS ghi
-
The column produced using
WITH OFFSET
has the implicit aliasoffset
.
-
For identifiers, the alias is the identifier. For example,
Alias visibility
After you introduce an explicit alias in a query, there are restrictions on where else in the query you can reference that alias. These restrictions on alias visibility are the result of GoogleSQL name scoping rules.
Visibility in the FROM
clause
GoogleSQL processes aliases in a FROM
clause from left to right,
and aliases are visible only to subsequent path expressions in a FROM
clause.
You can use any column name from a table in the FROM
as an alias anywhere in
the query, with or without qualification with the table name.
Example:
SELECT f1, s.f2
FROM example_table AS s
WHERE STARTS_WITH(CAST(f2['col1'] AS STRING), "1");
If the FROM
clause contains an explicit alias, you must use the explicit alias
instead of the implicit alias for the remainder of the query (see
Implicit Aliases). A table alias is useful for brevity or
to eliminate ambiguity in cases such as self-joins, where the same table is
scanned multiple times during query processing.
Example:
SELECT * FROM example_table AS s,
ORDER BY s._key
Invalid — ORDER BY
doesn't use the table alias:
SELECT * FROM example_table AS s
ORDER BY example_table._key; // INVALID.
Visibility in the SELECT
list
Aliases in the SELECT
list are not visible to any clauses.
Duplicate aliases
A SELECT
list containing multiple explicit or implicit aliases
of the same name is allowed, as long as the alias name is not referenced
elsewhere in the query, since the reference would be
ambiguous.
Example:
SELECT 1 AS a, 2 AS a;
/*---+---*
| a | a |
+---+---+
| 1 | 2 |
*---+---*/
Ambiguous aliases
GoogleSQL provides an error if accessing a name is ambiguous, meaning it can resolve to more than one unique object in the query or in a table schema, including the schema of a destination table.
A name is not ambiguous in ORDER BY
if it is both
a column name and a SELECT
list alias, as long as the name resolves to the
same underlying object.
Range variables
In GoogleSQL, a range variable is a table expression alias in the
FROM
clause. Sometimes a range variable is known as a table alias
. A
range variable lets you reference rows being scanned from a table expression.
A table expression represents an item in the FROM
clause that returns a table.
Common items that this expression can represent include
tables.
In general, a range variable provides a reference to the rows of a table
expression. A range variable can be used to qualify a column reference and
unambiguously identify the related table, for example range_variable.column_1
.
When referencing a range variable on its own without a specified column suffix, the result of a table expression is the row type of the related table. Value tables have explicit row types, so for range variables related to value tables, the result type is the value table's row type. Other tables don't have explicit row types, and for those tables, the range variable type is a dynamically defined struct that includes all of the columns in the table.
Examples
The following example selects column f1
from range variable bar
,
which in effect selects column f1
from table example_table
.
SELECT bar.f1 FROM example_table AS bar;
Appendix A: examples with sample data
These examples include statements which perform queries on the
Roster
and TeamMascot
,
and PlayerStats
tables.
Sample table with column families
The following table called example_table
is used to illustrate the
behavior of different query clauses in this reference:
/*-------------------------------------------------+
| example_table |
+-------------------------------------------------+
| column family name | column names (BYTES) |
+-------------------------------------------------+
| f1 | { "c1", "c2" } |
| f2 | { "col1", "col2", "col3" } |
| f3 | { "cl1" } |
+-------------------------------------------------*/
When queried, example_table
produces the following results:
/*-------------------------------------------------------------------------------+
| _key | f1 | f2 | f3 |
+-------------------------------------------------------------------------------+
| a#01 | { | { | { |
| | "c1": "xyz", | "col1": "def", | "cl1": "{ |
| | "c2": "nop" | "col2": "125", | "n": "Def",|
| | } | "col3": "2008-12-25 15:30:00+00" | "g": 40, |
| | | } | "k": 3 |
| | | | }" |
| | | | } |
+-------------------------------------------------------------------------------+
| a#02 | { | { | { |
| | "c1": "zyx", | "col1": "cba", | "cl1": "{ |
| | } | "col2": "123", | "n": "Gih",|
| | | "col3": "2018-10-15 15:30:00+00" | "g": 20, |
| | | } | }" |
| | | | } |
+-------------------------------------------------------------------------------+
| a#03 | { | { | { |
| | "c1": "abc", | "col1": "cba", | "cl1": "{ |
| | "c2": "def" | "col2": "543", | "n": "Xyz",|
| | } | "col3": "2020-11-10 13:21:00+00" | "g": 15, |
| | | } | }" |
| | | | } |
+-------------------------------------------------------------------------------+
| b#01 | { | { | NULL |
| | "c1": "jkl", | "col1": "qrs", | |
| | "c2": "mno" | "col2": "654", | |
| | } | "col3": "2021-10-10 14:18:03+07" | |
| | | } | |
+-------------------------------------------------------------------------------+
| b#02 | { | { | NULL |
| | "c1": "hi", | "col1": "tu", | |
| | "c2": "no" | "col3": "2023-10-10 11:28:09+05",| |
| | } | } | |
+-------------------------------------------------------------------------------+
| c#03 | { | { | { |
| | "c1": "j", | "col1": "u", | "cl1": "{ |
| | "c2": "l" | } | "n": "T", |
| | } | } | "g": 22, |
| | | | }" |
| | | | } |
+-------------------------------------------------------------------------------*/
In the preceding results, f1
, f2
, and f3
each represent a
column family, and _key
represents a key assigned to a group of
column families. Each column family contains its own columns. For example,
for key a#01
, the column family f1
contains the columns c1
and c2
. c1
contains the value "xyz"
and c2
contains the value "nop"
.