This document compares Apache Cassandra and Spanner concepts and practices. It assumes you're familiar with Cassandra and want to migrate existing applications or design new applications while using Spanner as your database.
Cassandra and Spanner are both large-scale distributed databases built for applications requiring high scalability and low latency. While both databases can support demanding NoSQL workloads, Spanner provides advanced features for data modeling, querying, and transactional operations. For more information about how Spanner meets NoSQL database criteria, see Spanner for non-relational workloads.
Migrate from Cassandra to Spanner
To migrate from Cassandra to Spanner, you can use the Cassandra to Spanner Proxy Adapter. This open source tool lets you migrate workloads from Cassandra or DataStax Enterprise (DSE) to Spanner without any changes to your application logic.
Core concepts
This section compares key Cassandra and Spanner concepts.
Terminology
Cassandra | Spanner |
---|---|
Cluster |
Instance A Cassandra cluster is equivalent to a Spanner instance - a collection of servers and storage resources. Because Spanner is a managed service, you don't have to configure the underlying hardware or software. You only need to specify the amount of nodes you want to reserve for your instance or choose autoscaling to automatically scale the instance. An instance acts like a container for databases, and data replication topology (regional, dual-region, or multi-region) is chosen at the instance level. |
Keyspace |
Database A Cassandra keyspace is equivalent to a Spanner database, which is a collection of tables and other schema elements (for example, indexes and roles). Unlike a keyspace, you don't need to configure the replication factor. Spanner automatically replicates your data to the region designated in your instance. |
Table |
Table In both Cassandra and Spanner, tables are a collection of rows identified by a primary key specified in the table schema. |
Partition |
Split Both Cassandra and Spanner scale by sharding data. In Cassandra, each shard is called a partition, while in Spanner, each shard is called a split. Cassandra uses hash-partitioning, which means that each row is independently assigned to a storage node based on a hash of the primary key. Spanner is range-sharded, which means that rows that are contiguous in primary key space are contiguous in storage as well (except at split boundaries). Spanner takes care of splitting and merging based on load and storage, and this is transparent to the application. The key implication is that unlike Cassandra, range scans over a prefix of the primary key is an efficient operation in Spanner. |
Row |
Row In both Cassandra and Spanner, a row is a collection of columns identified uniquely by a primary key. Like Cassandra, Spanner supports composite primary keys. Unlike Cassandra, Spanner doesn't make a distinction between partition key and sort key, because data is range-sharded. One can think of Spanner as only having sort keys, with partitioning managed behind the scenes. |
Column |
Column In both Cassandra and Spanner, a column is a set of data values that have the same type. There is one value for each row of a table. For more information about comparing Cassandra column types to Spanner, see Data Types. |
Architecture
A Cassandra cluster consists of a set of servers and storage colocated with those servers. A hash function maps rows from a partition key space to a virtual node (vnode). A set of vnodes is then randomly assigned to each server to serve a portion of the cluster key space. Storage for the vnodes is locally attached to the serving node. Client drivers connect directly to the serving nodes and handle load balancing and query routing.
A Spanner instance consists of a set of servers in a replication topology. Spanner dynamically shards each table into row ranges based on CPU and disk usage. Shards are assigned to compute nodes for serving. Data is physically stored on Colossus, Google's distributed file system, separate from the compute nodes. Client drivers connect to Spanner's frontend servers which perform request routing and load balancing. To learn more, see the Life of Spanner Reads & Writes whitepaper.
At a high level, both architectures scale as resources are added to the underlying cluster. Spanner's compute and storage separation allows faster rebalancing of load between compute nodes in response to workload changes. Unlike Cassandra, shard moves don't involve data moves as the data stays on Colossus. Moreover, Spanner's range-based partitioning might be more natural for applications that expect data to be sorted by partition key. The flip-side of range-based partitioning is that workloads that write to one end of the key space (for example, tables keyed by current timestamp) might face hotspotting without additional schema design consideration. For more information about techniques for overcoming hotspotting, see Schema design best practices.
Consistency
With Cassandra, you must specify a consistency level for each operation. If you use the quorum consistency level, a replica node majority must respond to the coordinator node for the operation to be considered successful. If you use a consistency level of one, Cassandra needs a single replica node to respond for the operation to be considered successful.
Spanner provides strong consistency. The Spanner API does not expose replicas to the client. Spanner's clients interact with Spanner as if it were a single machine database. A write is always written to a majority of replicas before being acknowledged to the user. Any subsequent reads reflects the newly written data. Applications can choose to read a snapshot of the database at a time in the past, which might have performance benefits over strong reads. For more information about the consistency properties of Spanner, see the Transactions overview.
Spanner was built to support the consistency and availability needed in large scale applications. Spanner provides strong consistency at scale and with high performance. For use cases that require it, Spanner supports snapshot reads that relax freshness requirements.
Data modeling
This section compares Cassandra and Spanner data models.
Table declaration
Table declaration syntax is fairly similar across Cassandra and Spanner. You specify the table name, column names and types, and the primary key which uniquely identifies a row. The key difference is that Cassandra is hash-partitioned and makes a distinction between partition key and sort key, whereas Spanner is range-partitioned. Spanner can be thought of as only having sort keys, with partitions automatically maintained behind the scenes. Like Cassandra, Spanner supports composite primary keys.
Single primary key part
The difference between Cassandra and Spanner is in the type names and the location of the primary key clause.
Cassandra | Spanner |
---|---|
CREATE TABLE users ( user_id bigint, first_name text, last_name text, PRIMARY KEY (user_id) ) |
CREATE TABLE users ( user_id int64, first_name string(max), last_name string(max), ) PRIMARY KEY (user_id) |
Multiple primary key parts
For Cassandra, the first primary key part is the "partition key" and the subsequent primary key parts are "sort keys". For Spanner, there is no separate partition key. Data is stored sorted by the entire composite primary key.
Cassandra | Spanner |
---|---|
CREATE TABLE user_items ( user_id bigint, item_id bigint, first_name text, last_name text, PRIMARY KEY (user_id, item_id) ) |
CREATE TABLE user_items ( user_id int64, item_id int64, first_name string(max), last_name string(max), ) PRIMARY KEY (user_id, item_id) |
Composite partition key
For Cassandra, partition keys can be a composite. There is no separate partition key in Spanner. Data is stored sorted by the entire composite primary key.
Cassandra | Spanner |
---|---|
CREATE TABLE user_category_items ( user_id bigint, category_id bigint, item_id bigint, first_name text, last_name text, PRIMARY KEY ((user_id, category_id), item_id) ) |
CREATE TABLE user_category_items ( user_id int64, category_id int64, item_id int64, first_name string(max), last_name string(max), ) PRIMARY KEY (user_id, category_id, item_id) |
Data types
This section compares Cassandra and Spanner data types. For more information about Spanner types, see Data types in GoogleSQL.
Cassandra | Spanner | |
---|---|---|
Numeric Types |
Standard integers:bigint (64-bit signed integer)int (32-bit signed integer)smallint (16-bit signed integer)tinyint (8-bit signed integer)
|
int64 (64-bit signed integer)Spanner supports a single 64-bit wide data type for signed integers. |
Standard floating point:double (64-bit IEEE-754 floating point)float (32-bit IEEE-754 floating point) |
float64 (64-bit IEEE-754 floating point)float32 (32-bit IEEE-754 floating point)
|
|
Variable precision numbers:varint (variable precision integer)decimal (variable precision decimal)
|
For fixed precision decimal numbers, use numeric (precision 38 scale 9).
Otherwise, use string in conjunction with an application layer variable
precision integer library.
|
|
String Types |
text varchar
|
string(max) Both text and varchar store and validate for UTF-8 strings. In Spanner,
string columns need to specify their maximum length (there is no impact on
storage; this is for validation purposes).
|
blob |
bytes(max) To store binary data, use the bytes data type.
|
|
Date and Time Types | date |
date |
duration |
int64 Spanner doesn't support a dedicated duration data type. Use int64 to store
nanosecond duration.
|
|
time |
int64 Spanner doesn't support a dedicated time-within-day data type. Use int64 to
store nanosecond offset within a day.
|
|
timestamp |
timestamp |
|
Container Types | User defined types | json or proto |
list |
array Use array to store a list of typed objects.
|
|
map |
json or proto Spanner doesn't support a dedicated map type. Use json or proto
columns to represent maps. For more information, see Store large maps as interleaved tables.
|
|
set |
array Spanner doesn't support a dedicated set type. Use array columns to represent
a set , with the application managing set uniqueness. For more information, see Store large maps as interleaved tables,
which can also be used to store large sets.
|
Basic usage patterns
The following code examples show the difference between Cassandra and Spanner client code in Go. For more information, see Spanner client libraries.
Client initialization
In Cassandra clients, you create a cluster object representing the underlying Cassandra cluster, instantiate a session object which abstracts a connection to the cluster, and issue queries on the session. In Spanner, you create a client object bound to a specific database, and issue database requests on the client object.
Cassandra example
Go
import "github.com/gocql/gocql" ... cluster := gocql.NewCluster("<address>") cluster.Keyspace = "<keyspace>" session, err := cluster.CreateSession() if err != nil { return err } defer session.Close() // session.Query(...)
Spanner example
Go
import "cloud.google.com/go/spanner" ... client, err := spanner.NewClient(ctx, fmt.Sprintf("projects/%s/instances/%s/databases/%s", project, instance, database)) defer client.Close() // client.Apply(...)
Read data
Reads in Spanner can be performed through both a key-value style API and a query API. As a Cassandra user, you might find the query API more familiar. A
key difference in the query API is that Spanner requires
named arguments (unlike positional arguments ?
in Cassandra). The name of an
argument in a Spanner query must be prefixed by an @
.
Cassandra example
Go
stmt := `SELECT user_id, first_name, last_name FROM users WHERE user_id = ?` var ( userID int firstName string lastName string ) err := session.Query(stmt, 1).Scan(&userID, &firstName, &lastName)
Spanner example
Go
stmt := spanner.Statement{ SQL: `SELECT user_id, first_name, last_name FROM users WHERE user_id = @user_id`, Params: map[string]any{"user_id": 1}, } var ( userID int64 firstName string lastName string ) err := client.Single().Query(ctx, stmt).Do(func(row *spanner.Row) error { return row.Columns(&userID, &firstName, &lastName) })
Insert data
A Cassandra INSERT
is equivalent to a Spanner INSERT OR UPDATE
.
You must specify the full primary key for an insert. Spanner
supports both DML and a key-value style mutation API. The key-value style
mutation API is recommended for trivial writes due to lower latency.
The Spanner DML API has more features as it supports the full SQL surface (including
the use of expressions in the DML statement).
Cassandra example
Go
stmt := `INSERT INTO users (user_id, first_name, last_name) VALUES (?, ?, ?)` err := session.Query(stmt, 1, "John", "Doe").Exec()
Spanner example
Go
_, err := client.Apply(ctx, []*spanner.Mutation{ spanner.InsertOrUpdateMap( "users", map[string]any{ "user_id": 1, "first_name": "John", "last_name": "Doe", } )})
Batch insert data
In Cassandra, you can insert multiple rows using a batch statement. In Spanner, a commit operation can contain multiple mutations. Spanner inserts these mutations to the database atomically.
Cassandra example
Go
stmt := `INSERT INTO users (user_id, first_name, last_name) VALUES (?, ?, ?)` b := session.NewBatch(gocql.UnloggedBatch) b.Entries = []gocql.BatchEntry{ {Stmt: stmt, Args: []any{1, "John", "Doe"}}, {Stmt: stmt, Args: []any{2, "Mary", "Poppins"}}, } err = session.ExecuteBatch(b)
Spanner example
Go
_, err := client.Apply(ctx, []*spanner.Mutation{ spanner.InsertOrUpdateMap( "users", map[string]any{ "user_id": 1, "first_name": "John", "last_name": "Doe" }, ), spanner.InsertOrUpdateMap( "users", map[string]any{ "user_id": 2, "first_name": "Mary", "last_name": "Poppins", }, ), })
Delete data
Cassandra deletes require specifying the primary key of the rows to be deleted.
This is similar to the DELETE
mutation in Spanner.
Cassandra example
Go
stmt := `DELETE FROM users WHERE user_id = ?` err := session.Query(stmt, 1).Exec()
Spanner example
Go
_, err := client.Apply(ctx, []*spanner.Mutation{ spanner.Delete("users", spanner.Key{1}), })
Advanced topics
This section contains information on how to use more advanced Cassandra features in Spanner.
Write timestamp
Cassandra allows mutations to explicitly specify a write timestamp for a
particular cell using the USING TIMESTAMP
clause. Typically, this feature is
used to manipulate Cassandra's last-writer-wins semantics.
Spanner doesn't allow clients to specify the timestamp of each
write. Each cell is internally marked with the TrueTime timestamp at the
time when the cell value was committed. Because Spanner provides a strongly
consistent and strictly serializable interface, most applications don't need the
functionality of USING TIMESTAMP
.
If you rely on Cassandra's USING TIMESTAMP
for application specific logic, you can add an
extra TIMESTAMP
column to your Spanner schema, which can track modification time at the
application level. Updates to a row can then be wrapped in a read-write
transaction. For example:
Cassandra example
Go
stmt := `INSERT INTO users (user_id, first_name, last_name) VALUES (?, ?, ?) USING TIMESTAMP ?` err := session.Query(stmt, 1, "John", "Doe", ts).Exec()
Spanner example
Create schema with an explicit update timestamp column.
GoogleSQL
CREATE TABLE users ( user_id INT64, first_name STRING(MAX), last_name STRING(MAX), update_ts TIMESTAMP NOT NULL OPTIONS (allow_commit_timestamp=true), ) PRIMARY KEY (user_id)
Customize logic to update the row and include a timestamp.
Go
func ShouldUpdateRow(ctx context.Context, txn *spanner.ReadWriteTransaction, updateTs time.Time) (bool, error) { // Read the existing commit timestamp. row, err := txn.ReadRow(ctx, "users", spanner.Key{1}, []string{"update_ts"}) // Treat non-existent row as NULL timestamp - the row should be updated. if spanner.ErrCode(err) == codes.NotFound { return true, nil } // Propagate unexpected errors. if err != nil { return false, err } // Check if the committed timestamp is newer than the update timestamp. var committedTs *time.Time err = row.Columns(&committedTs) if err != nil { return false, err } if committedTs != nil && committedTs.Before(updateTs) { return false, nil } // Committed timestamp is older than update timestamp - the row should be updated. return true, nil }
Check custom condition before updating the row.
Go
_, err := client.ReadWriteTransaction(ctx, func(ctx context.Context, txn *spanner.ReadWriteTransaction) error { // Check if the row should be updated. ok, err := ShouldUpdateRow(ctx, txn, time.Now()) if err != nil { return err } if !ok { return nil } // Update the row. txn.BufferWrite([]*spanner.Mutation{ spanner.InsertOrUpdateMap("users", map[string]any{ "user_id": 1, "first_name": "John", "last_name": "Doe", "update_ts": spanner.CommitTimestamp, })}) return nil })
Conditional mutations
The INSERT ... IF EXISTS
statement in Cassandra is equivalent to the INSERT
statement in Spanner. In both cases, the insert fails if the row
already exists.
In Cassandra, you can also create DML statements that specify a condition, and
the statement fails if the condition evaluates to false. In
Spanner, you can use conditional UPDATE
mutations in read-write transactions. For example, to update a row only if a
particular condition exists:
Cassandra example
Go
stmt := `UPDATE users SET last_name = ? WHERE user_id = ? IF first_name = ?` err := session.Query(stmt, 1, "Smith", "John").Exec()
Spanner example
Customize logic to update the row and include a condition.
Go
func ShouldUpdateRow(ctx context.Context, txn *spanner.ReadWriteTransaction) (bool, error) { row, err := txn.ReadRow(ctx, "users", spanner.Key{1}, []string{"first_name"}) if err != nil { return false, err } var firstName *string err = row.Columns(&firstName) if err != nil { return false, err } if firstName != nil && firstName == "John" { return false, nil } return true, nil }
Check custom condition before updating the row.
Go
_, err := client.ReadWriteTransaction(ctx, func(ctx context.Context, txn *spanner.ReadWriteTransaction) error { ok, err := ShouldUpdateRow(ctx, txn, time.Now()) if err != nil { return err } if !ok { return nil } txn.BufferWrite([]*spanner.Mutation{ spanner.InsertOrUpdateMap("users", map[string]any{ "user_id": 1, "last_name": "Smith", "update_ts": spanner.CommitTimestamp, })}) return nil })
TTL
Cassandra supports setting a time to live (TTL) value at the row or column level. In Spanner, TTL is configured at the row level, and you designate a named column as the expiration time for the row. For more information, see the Time to live (TTL) overview.
Cassandra example
Go
stmt := `INSERT INTO users (user_id, first_name, last_name) VALUES (?, ?, ?) USING TTL 86400 ?` err := session.Query(stmt, 1, "John", "Doe", ts).Exec()
Spanner example
Create schema with an explicit update timestamp column
GoogleSQL
CREATE TABLE users ( user_id INT64, first_name STRING(MAX), last_name STRING(MAX), update_ts TIMESTAMP NOT NULL OPTIONS (allow_commit_timestamp=true), ) PRIMARY KEY (user_id), ROW DELETION POLICY (OLDER_THAN(update_ts, INTERVAL 1 DAY));
Insert rows with a commit timestamp.
Go
_, err := client.Apply(ctx, []*spanner.Mutation{ spanner.InsertOrUpdateMap("users", map[string]any{ "user_id": 1, "first_name": "John", "last_name": "Doe", "update_ts": spanner.CommitTimestamp}), })
Store large maps as interleaved tables.
Cassandra supports the map
type for storing ordered, key-value pairs. To store
map
types that contain a small amount of data in Spanner, you
can use the JSON
or PROTO
types, which let you store semi-structured and structured data respectively.
Updates to such columns require the entire column value to be re-written. If you
have a use case where a large amount of data is stored in a Cassandra map
, and
only a small portion of the map
needs to be updated, using
INTERLEAVED
tables might be a good fit. For example, to associate a large amount of
key-value data with a particular user:
Cassandra example
CREATE TABLE users (
user_id bigint,
attachments map<string, string>,
PRIMARY KEY (user_id)
)
Spanner example
CREATE TABLE users (
user_id INT64,
) PRIMARY KEY (user_id);
CREATE TABLE user_attachments (
user_id INT64,
attachment_key STRING(MAX),
attachment_val STRING(MAX),
) PRIMARY KEY (user_id, attachment_key);
In this case, a user attachments row is stored colocated with the corresponding user row, and can be retrieved and updated efficiently along with the user row. You can use the read-write APIs in Spanner to interact with interleaved tables. For more information on interleaving, see Create parent and child tables.
Developer experience
This section compares Spanner and Cassandra developer tools.
Local development
You can run Cassandra locally for development and unit testing. Spanner provides a similar environment for local development through the Spanner emulator. The emulator provides a high fidelity environment for interactive development and unit tests. For more information, see Emulate Spanner locally.
Command line
The Spanner equivalent to Cassandra's nodetool
is the
Google Cloud CLI. You can perform control plane and data
plane operations using gcloud spanner
. For more information, see the
Google Cloud CLI Spanner reference guide.
If you need a REPL interface to issue queries to Spanner
similar to cqlsh
, you can use the spanner-cli
tool. To install and run
spanner-cli
in Go:
go install github.com/cloudspannerecosystem/spanner-cli@latest
$(go env GOPATH)/bin/spanner-cli
For more information, see the spanner-cli GitHub repository.