This page explains transactions in Spanner and includes sample code for executing transactions.
Introduction
A transaction in Spanner is a set of reads and writes that execute atomically at a single logical point in time across columns, rows, and tables in a database.
Spanner supports these transaction modes:
Locking read-write. These transactions rely on pessimistic locking and, if necessary, two-phase commit. Locking read-write transactions may abort, requiring the application to retry.
Read-only. This transaction type provides guaranteed consistency across several reads, but does not allow writes. By default, read-only transactions execute at a system-chosen timestamp that guarantees external consistency, but they can also be configured to read at a timestamp in the past. Read-only transactions do not need to be committed and do not take locks. In addition, read-only transactions might wait for in-progress writes to complete before executing.
Partitioned DML. This transaction type executes a Data Manipulation Language (DML) statement as Partitioned DML. Partitioned DML is designed for bulk updates and deletes, particularly periodic cleanup and backfilling. If you need to commit a large number of blind writes, but don't require an atomic transaction, you can bulk modify your Spanner tables using batch write. For more information, see Modify data using batch writes.
This page describes the general properties and semantics of transactions in Spanner and introduces the read-write, read-only, and Partitioned DML transaction interfaces in Spanner.
Read-write transactions
Here are scenarios in which you should use a locking read-write transaction:
- If you do a write that depends on the result of one or
more reads, you should do that write and the read(s) in the same
read-write transaction.
- Example: double the balance of bank account A. The read of A's balance should be in the same transaction as the write to replace the balance with the doubled value.
- If you do one or more writes that need to be committed atomically, you
should do those writes in the same read-write transaction.
- Example: transfer $200 from account A to account B. Both of the writes (one to decrease A by $200 and one to increase B by $200) and the reads of initial account balances should be in the same transaction.
- If you might do one or more writes, depending on the results of
one or more reads, you should do those writes and reads in the same
read-write transaction, even if the write(s) don't end up executing.
- Example: transfer $200 from bank account A to bank account B if A's current balance is greater than $500. Your transaction should contain a read of A's balance and a conditional statement that contains the writes.
Here is a scenario in which you should not use a locking read-write transaction:
- If you are only doing reads, and you can express your read using a single read method, you should use that single read method or a read-only transaction. Single reads do not lock, unlike read-write transactions.
Properties
A read-write transaction in Spanner executes a set of reads and writes atomically at a single logical point in time. Furthermore, the timestamp at which read-write transactions execute matches wall clock time, and the serialization order matches the timestamp order.
Why use a read-write transaction? Read-write transactions provide the ACID properties of relational databases (In fact, Spanner read-write transactions offer even stronger guarantees than traditional ACID; see the Semantics section below.).
Isolation
The following are isolation properties for read-write and read-only transactions.
Transactions that read and write
Here are the isolation properties that you get after successfully committing a transaction that contains a series of reads (or queries) and writes:
- All reads within the transaction returned values that reflect a consistent snapshot taken at the commit timestamp of the transaction.
- Empty rows or ranges remained so at commit time.
- All writes within the transaction were committed at the transaction's commit timestamp.
- Writes were not visible to any transaction until after the transaction committed.
Certain Spanner client drivers contain transaction retry logic to mask transient errors, which they do by re-running the transaction and validating the client-observed data.
The effect is that all reads and writes appear to have occurred at a single point in time, both from the perspective of the transaction itself and from the perspective of other readers and writers to the Spanner database. In other words, the reads and the writes end up occurring at the same timestamp (see an illustration of this in the Serializability and external consistency section below).
Transactions that only read
The guarantees for a read-write transaction that only reads are similar: all reads within that transaction return data from the same timestamp, even for row non-existence. One difference is that if you read data, and later commit the read-write transaction without any writes, there is no guarantee that the data did not change in the database after the read and before the commit. If you want to know whether data has changed since you read it last, the best approach is to read it again (either in a read-write transaction, or using a strong read.) Also, for efficiency, if you know in advance that you'll only be reading and not writing, you should use a read-only transaction instead of a read-write transaction.
Atomicity, Consistency, Durability
In addition to the Isolation property, Spanner provides Atomicity (if any of the writes in the transaction commit, they all commit), Consistency (the database remains in a consistent state after the transaction) and Durability (committed data stays committed.)
Benefits of these properties
Because of these properties, as an application developer, you can focus on the correctness of each transaction on its own, without worrying about how to protect its execution from other transactions that might execute at the same time.
Interface
The Spanner client libraries provide an interface for executing a body of work in the context of a read-write transaction, with retries for transaction aborts. Here's a bit of context to explain this point: a Spanner transaction may have to be tried multiple times before it commits. For example, if two transactions attempt to work on data at the same time in a way that might cause deadlock, Spanner aborts one of them so that the other transaction can make progress. (More rarely, transient events within Spanner may result in some transactions aborting.) Since transactions are atomic, an aborted transaction has no visible effect on the database. Therefore, transactions should be executed by retrying them until they succeed.
When you use a transaction in a Spanner client library, you define the body of a transaction (i.e., the reads and writes to perform on one or more tables in a database) in the form of a function object. Under the hood, the Spanner client library runs the function repeatedly until the transaction commits or a non-retryable error is encountered.
Example
Assume you added a MarketingBudget
column to the
Albums
table shown in the
Schema and Data Model page:
CREATE TABLE Albums ( SingerId INT64 NOT NULL, AlbumId INT64 NOT NULL, AlbumTitle STRING(MAX), MarketingBudget INT64 ) PRIMARY KEY (SingerId, AlbumId);
Your marketing department decides to do a marketing push for the album keyed by
Albums (1, 1)
and has asked you to move $200,000 from the budget of Albums
(2, 2)
, but only if the money is available in that album's budget. You should
use a locking read-write transaction for this operation, because the transaction
might do writes depending on the result of a read.
The following shows how to execute a read-write transaction:
C++
C#
Go
Java
Node.js
PHP
Python
Ruby
Semantics
Serializability and external consistency
Spanner provides 'serializability', which means that all transactions appear as if they executed in a serial order, even if some of the reads, writes, and other operations of distinct transactions actually occurred in parallel. Spanner assigns commit timestamps that reflect the order of committed transactions to implement this property. In fact, Spanner offers a stronger guarantee than serializability called external consistency: transactions commit in an order that is reflected in their commit timestamps, and these commit timestamps reflect real time so you can compare them to your watch. Reads in a transaction see everything that has been committed before the transaction commits, and writes are seen by everything that starts after the transaction is committed.
For example, consider the execution of two transactions as illustrated in the diagram below:
Transaction Txn1
in blue reads some data A
, buffers a write to A
, then
successfully commits. Transaction Txn2
in green starts after Txn1
, reads
some data B
, then reads the data A
. Since Txn2
reads the value of A
after Txn1
committed its write to A
, Txn2
sees the effect of Txn1
's
write to A
, even though Txn2
started before Txn1
completed.
Even though there is some overlap in time in which Txn1
and Txn2
are both
executing, their commit timestamps c1
and c2
respect a linear transaction
order, which means that all effects of the reads and writes of Txn1
appear
to have occurred at a single point of time (c1
), and all effects of the
reads and writes of Txn2
appear to have occurred at a single point of time
(c2
). Furthermore, c1 < c2
(which is guaranteed because both Txn1
and
Txn2
committed writes; this is true even if the writes happened on different
machines), which respects the order of Txn1
happening before Txn2
.
(However, if Txn2
only did reads in the transaction, then c1 <= c2
).
Reads observe a prefix of the commit history; if a read sees the effect of
Txn2
, it also sees the effect of Txn1
. All transactions that commit
successfully have this property.
Read and write guarantees
If a call to run a transaction fails, the read and write guarantees you have depend on what error the underlying commit call failed with.
For example, an error such as "Row Not Found" or "Row Already Exists" means that writing the buffered mutations encountered some error, e.g. a row that the client is trying to update doesn't exist. In that case, the reads are guaranteed consistent, the writes are not applied, and the non-existence of the row is guaranteed to be consistent with the reads as well.
Cancelling transaction operations
Asynchronous read operations may be cancelled any time by the user (e.g., when a higher level operation is cancelled or you decide to stop a read based on the initial results received from the read) without affecting any other existing operations within the transaction.
However, even if you have attempted to cancel the read, Spanner does not guarantee that the read is actually cancelled. After you request the cancellation of a read, that read can still successfully complete or fail with some other reason (e.g. Abort). Furthermore, that cancelled read might actually return some results to you, and those possibly incomplete results will be validated as part of the transaction Commit.
Note that unlike reads, cancelling a transaction Commit operation will result in aborting the transaction (unless the transaction has already Committed or failed with another reason).
Performance
Locking
Spanner allows multiple clients to simultaneously interact with the same database. In order to ensure the consistency of multiple concurrent transactions, Spanner uses a combination of shared locks and exclusive locks to control access to the data. When you perform a read as part of a transaction, Spanner acquires shared read locks, which allows other reads to still access the data until your transaction is ready to commit. When your transaction is committing and writes are being applied, the transaction attempts to upgrade to an exclusive lock. It blocks new shared read locks on the data, waits for existing shared read locks to clear, then places an exclusive lock for exclusive access to the data.
Notes about locks:
- Locks are taken at the granularity of row-and-column. If transaction T1 has locked column "A" of row "foo", and transaction T2 wants to write column "B" of row "foo" then there is no conflict.
- Writes to a data item that don't also read the data being written (aka "blind writes") don't conflict with other blind writers of the same item (the commit timestamp of each write determines the order in which it is applied to the database). A consequence of this is that Spanner only needs to upgrade to an exclusive lock if you have read the data you are writing. Otherwise Spanner uses a shared lock called a writer shared lock.
- When performing row lookups inside a read-write transaction, use secondary indexes to limit the rows scanned to a smaller range. This causes Spanner to lock a fewer number of rows in the table, allowing concurrent modification to rows outside of the range.
Locks should not be used to ensure exclusive access to a resource outside of Spanner. Transactions can be aborted for several reasons by Spanner such as, for example, when allowing data to move around the instance's compute resources. If a transaction is retried, whether explicitly by application code or implicitly by client code such as the Spanner JDBC driver, it is only guaranteed that the locks were held during the attempt that actually committed.
You can use the Lock statistics introspection tool to investigate lock conflicts in your database.
Deadlock detection
Spanner detects when multiple transactions might be deadlocked, and
forces all but one of the transactions to abort. For example, consider the
following scenario: transaction Txn1
holds a lock on record A
and is waiting
for a lock on record B
, and Txn2
holds a lock on record B
and is waiting
for a lock on record A
. The only way to make progress in this situation is to
abort one of the transactions so it releases its lock, allowing the other
transaction to proceed.
Spanner uses the standard "wound-wait" algorithm to handle deadlock detection. Under the hood, Spanner keeps track of the age of each transaction that requests conflicting locks. It also allows older transactions to abort younger transactions (where "older" means that the transaction's earliest read, query, or commit happened sooner).
By giving priority to older transactions, Spanner ensures that every transaction has a chance to acquire locks eventually, after it becomes old enough to have higher priority than other transactions. For example, a transaction that acquires a reader shared lock can be aborted by an older transaction that needs a writer shared lock.
Distributed execution
Spanner can run transactions on data that spans multiple servers. This power comes at a performance cost compared to single-server transactions.
What types of transactions might be distributed? Under the hood, Spanner can divide responsibility for rows in the database across many servers. A row and the corresponding rows in interleaved tables are usually served by the same server, as are two rows in the same table with nearby keys. Spanner can perform transactions across rows on different servers; however, as a rule of thumb, transactions that affect many co-located rows are faster and cheaper than transactions that affect many rows scattered throughout the database, or throughout a large table.
The most efficient transactions in Spanner include only the reads and writes that should be applied atomically. Transactions are fastest when all reads and writes access data in the same part of the key space.
Read-only transactions
In addition to locking read-write transactions, Spanner offers read-only transactions.
Use a read-only transaction when you need to execute more than one read at the same timestamp. If you can express your read using one of Spanner's single read methods, you should use that single read method instead. The performance of using such a single read call should be comparable to the performance of a single read done in a read-only transaction.
If you are reading a large amount of data, consider using partitions to read the data in parallel.
Because read-only transactions don't write, they don't hold locks and they don't block other transactions. Read-only transactions observe a consistent prefix of the transaction commit history, so your application always gets consistent data.
Properties
A Spanner read-only transaction executes a set of reads at a single logical point in time, both from the perspective of the read-only transaction itself and from the perspective of other readers and writers to the Spanner database. This means that read-only transactions always observe a consistent state of the database at a chosen point in the transaction history.
Interface
Spanner provides an interface for executing a body of work in the context of a read-only transaction, with retries for transaction aborts.
Example
The following shows how to use a read-only transaction to get consistent data for two reads at the same timestamp:
C++
C#
Go
Java
Node.js
PHP
Python
Ruby
Partitioned DML transactions
Using Partitioned Data Manipulation Language
(Partitioned DML), you can execute large-scale UPDATE
and DELETE
statements
without running into transaction limits or locking an entire table.
Spanner partitions the key space and executes the DML statements on each
partition in a separate read-write transaction.
You run DML statements in read-write transactions that you explicitly create in your code. For more information, see Using DML.
Properties
You can execute only one Partitioned DML statement at a time, whether you are using a client library method or the Google Cloud CLI.
Partitioned transactions do not support commit or rollback. Spanner executes and applies the DML statement immediately. If you cancel the operation, or the operation fails, then Spanner cancels all the executing partitions and doesn't start any of the remaining partitions. Spanner does not rollback any partitions that have already executed.
Interface
Spanner provides an interface for executing a single Partitioned DML statement.
Examples
The following code example updates the MarketingBudget
column of the Albums
table.
C++
You use the ExecutePartitionedDml()
function to execute a Partitioned DML statement.
C#
You use the ExecutePartitionedUpdateAsync()
method to execute a Partitioned DML statement.
Go
You use the PartitionedUpdate()
method to execute a Partitioned DML statement.
Java
You use the executePartitionedUpdate()
method to execute a Partitioned DML statement.
Node.js
You use the runPartitionedUpdate()
method to execute a Partitioned DML statement.
PHP
You use the executePartitionedUpdate()
method to execute a Partitioned DML statement.
Python
You use the execute_partitioned_dml()
method to execute a Partitioned DML statement.
Ruby
You use the execute_partitioned_update()
method to execute a Partitioned DML statement.
The following code example deletes rows from the Singers
table, based on the
SingerId
column.