Export data to Spanner (reverse ETL)
To provide feedback or request support for this feature, send email to bq-cloud-spanner-federation-preview@google.com.
This document describes how you can set up a reverse extract, transform, and load
(reverse ETL) workflow from BigQuery to Spanner. You can do this by using the
EXPORT DATA
statement
to export data from a BigQuery table to a
Spanner table.
This reverse ETL workflow combines analytic capabilities in BigQuery with low latency and high throughput in Spanner. This workflow lets you serve data to application users without exhausting quotas and limits on BigQuery.
Before you begin
- Grant Identity and Access Management (IAM) roles that give users the necessary permissions to perform each task in this document.
Required roles
To get the permissions that you need to export BigQuery data to Spanner, ask your administrator to grant you the following IAM roles on your project:
-
Export data from a BigQuery table:
BigQuery Data Viewer (
roles/bigquery.dataViewer
) -
Run an export job:
BigQuery User (
roles/bigquery.user
) -
Write data to a Spanner table:
Cloud Spanner Database User (
roles/spanner.databaseUser
)
For more information about granting roles, see Manage access to projects, folders, and organizations.
You might also be able to get the required permissions through custom roles or other predefined roles.
Limitations
The following BigQuery data types don't have equivalents in Spanner and are not supported:
Spanner database dialect | Unsupported BigQuery types |
---|---|
All dialects |
|
GoogleSQL |
|
The maximum size of an exported row cannot exceed 1 MiB.
Configure exports with spanner_options
option
You can use the spanner_options
option to specify a destination Spanner
database and table. The configuration is expressed in the form of a JSON string,
as the following example shows:
EXPORT DATA OPTIONS( uri="https://spanner.googleapis.com/projects/PROJECT_ID
/instances/INSTANCE_ID
/databases/DATABASE_ID
", format='CLOUD_SPANNER', spanner_options = """{ "table": "TABLE_NAME
", "priority": "PRIORITY
", "tag": "TAG
", }""" )
Replace the following:
PROJECT_ID
: the name of your Google Cloud project.INSTANCE_ID
: the name of your database instance.DATABASE_ID
: the name of your database.TABLE_NAME
: the name of an existing destination table.PRIORITY
(optional): priority of the write requests. Allowed values:LOW
,MEDIUM
,HIGH
. Default value:MEDIUM
.TAG
(optional): request tag to help identify exporter traffic in Spanner monitoring. Default value:bq_export
.
Export query requirements
To export query results to Spanner, the results must meet the following requirements:
- All columns in the result set must exist in the destination table, and their types must match.
- The result set must contain all
NOT NULL
columns for the destination table. - Column values must not exceed Spanner data size limits within tables.
- Any unsupported column types must be converted to one of the supported types before exporting to Spanner.
Export data
You can use the EXPORT DATA
statement
to export data from a BigQuery table into a Spanner table.
The following example exports selected fields from a table that's named
mydataset.table1
:
EXPORT DATA OPTIONS ( uri="https://spanner.googleapis.com/projects/PROJECT_ID
/instances/INSTANCE_ID
/databases/DATABASE_ID
", format='CLOUD_SPANNER', spanner_options="""{ "table": "TABLE_NAME" }""" ) AS SELECT * FROM mydataset.table1;
Replace the following:
PROJECT_ID
: the name of your Google Cloud projectINSTANCE_ID
: the name of your database instanceDATABASE_ID
: the name of your databaseTABLE_NAME
: the name of an existing destination table
Export optimization
To optimize the export of records from BigQuery to Spanner, you can try the following:
Increasing the number of nodes in the Spanner destination instance. The exporter automatically scales the number of BigQuery export slots in proportion to the number of Spanner nodes and processing units. For additional details about maximizing write throughput, see Performance overview.
Avoid ordering the query results. If the result set contains all primary key columns, then the exporter automatically sorts the primary keys of the destination table to streamline writes and minimize contention.
If the destination table's primary key includes generated columns, then you should add the generated columns' expressions to the query to ensure that the exported data is sorted properly.
Pricing
For information on data export pricing, see BigQuery pricing.
After the data is exported, you're charged for storing the data in Spanner. For more information, see Spanner pricing.