Cassandra backup and recovery

This section discusses how to configure data backup and recovery for the Apache Cassandra database ring installed in the Apigee hybrid runtime plane. See also Cassandra datastore.

What you need to know about Cassandra backups

Cassandra is a replicated database that is configured to have at least three copies of your data in each region or data center. Cassandra uses streaming replication and read repairs to maintain the data replicas in each region or data center at any given point.

In hybrid, Cassandra backups are not enabled by default. It's a good practice, however, to enable Cassandra backups in case your data is accidentally deleted.

What is backed up?

The backup configuration described in this topic backs up the following entities:

  • Cassandra schema including the user schema (Apigee keyspace definitions)
  • Cassandra partition token information per node
  • A snapshot of the Cassandra data

Where is backup data stored?

Backed up data is stored in a Google Cloud Storage bucket that you must create. Bucket creation and configuration is covered in this topic.

Scheduling Cassandra backups

Backups are scheduled as cron jobs in the runtime plane. To schedule Cassandra backups:

  1. Run the following create-service-account command to create a Google Cloud service account (SA) with the standard roles/storage.objectAdmin role. This SA role allows you to write backup data to Cloud Storage. Execute the following command in the hybrid installation root directory:
    ./tools/create-service-account apigee-cassandra OUTPUT_DIR
    For example:
    ./tools/create-service-account apigee-cassandra ./service-accounts
    For more information about Google Cloud service accounts, see Creating and managing service accounts.
  2. The create-service-account command saves a JSON file containing the service account private key. The file is saved in the same directory where the command executes. You will need the path to this file in the following steps.
  3. Create a Cloud Storage bucket. Specify a reasonable data retention policy for the bucket. Apigee recommends a data retention policy of 15 days.
  4. Open your overrides.yaml file.
  5. Add the following cassandra.backup properties to enable backup. Do not remove any of the properties that are already configured.

    Parameters

    cassandra:
      ...
    
      backup:
        enabled: true
        serviceAccountPath: SA_JSON_FILE_PATH
        dbStorageBucket: CLOUD_STORAGE_BUCKET_PATH
        schedule: BACKUP_SCHEDULE_CODE
    
      ...
      

    Example

    ...
    
    cassandra:
      storage:
        type: gcepd
        capacity: 50Gi
        gcepd:
          replicationType: regional-pd
      sslRootCAPath: "/Users/myhome/ssh/cassandra.crt"
      sslCertPath: "/Users/myhome/ssh/cassandra.crt"
      sslKeyPath: "/Users/myhome/ssh/cassandra.key"
      auth:
        default:
          password: "abc123"
        admin:
          password: "abc234"
        ddl:
          password: "abc345"
        dml:
          password: "abc456"
      nodeSelector:
        key: cloud.google.com/gke-nodepool
        value: apigee-data
      backup:
        enabled: true
        serviceAccountPath: "/Users/myhome/.ssh/my_cassandra_backup.json"
        dbStorageBucket: "gs://myname-cassandra-backup"
        schedule: "45 23 * * 6"
    
      ... 
  6. Where:
    Property Description
    backup:enabled Backup is disabled by default. You must set this property to true.
    backup:serviceAccountPath

    SA_JSON_FILE_PATH

    The path on your filesystem to the service account JSON file that was downloaded when you ran ./tools/create-service-account

    backup:dbStorageBucket

    CLOUD_STORAGE_BUCKET_PATH

    The Cloud Storage bucket path in this format: gs://BUCKET_NAME. The gs:// is required.

    backup:schedule

    BACKUP_SCHEDULE_CODE

    The time when the backup starts, specified in standard crontab syntax. Default: 0 2 * * *

  7. Apply the configuration changes to the new cluster. For example:
    ./apigeectl apply -f overrides.yaml

Restoring backups

Restoration takes your data from the backup location and restores the data into a new Cassandra cluster with the same number of nodes. No data is taken from the old Cassandra cluster.

The restoration instructions below are for single region deployments that use Google Cloud Storage for backups. For multi-region deployments, see Multi-region deployment on GKE and GKE on-prem.

To restore Cassandra backups:

  1. Create a new namespace within the existing Kubernetes cluster that will be used to restore the hybrid runtime deployment. Do not use the original namespace name for the new namespace. Do not use the old namespace for restoration.
  2. In the root hybrid installation directory, create a new overrides-restore.yaml file.
  3. Copy the complete Cassandra configuration from your original overrides.yaml file into the new overrides-restore.yaml file. See the following command for an example.
    cp ./overrides.yaml ./overrides-restore.yaml
  4. Add a namespace element to the new overrides-restore.yaml file. Do not use the same namespace that was used for your original cluster.

    Parameters

    namespace: YOUR_RESTORE_NAMESPACE
    cassandra:
      ...
      restore:
        enabled: true
        snapshotTimestamp: TIMESTAMP
        serviceAccountPath: SA_JSON_FILE_PATH
        dbStorageBucket: CLOUD_STORAGE_BUCKET_PATH
             image:
               pullPolicy: Always
      ...

    Example

    ...
        namespace: cassandra-restore
        cassandra:
          storage:
            type: gcepd
            capacity: 50Gi
            gcepd:
              replicationType: regional-pd
          sslRootCAPath: "/Users/myhome/ssh/cassandra.crt"
          sslCertPath: "/Users/myhome/ssh/cassandra.crt"
          sslKeyPath: "/Users/myhome/ssh/cassandra.key"
          auth:
            default:
              password: "abc123"
            admin:
              password: "abc234"
            ddl:
              password: "abc345"
            dml:
              password: "abc456"
          nodeSelector:
            key: cloud.google.com/gke-nodepool
            value: apigee-data
    
          restore:
            enabled: true
            snapshotTimestamp: "20210203213003"
            serviceAccountPath: "/Users/myhome/.ssh/my_cassandra_backup.json"
            dbStorageBucket: "gs://myname-cassandra-backup"
            image:
              pullPolicy: Always
        ...
    

    Where:

    Property Description
    namespace

    YOUR_RESTORE_NAMESPACE

    The name of the new namespace you created in step 1 for the new Cassandra cluster. Do not use the same namespace you used for your original cluster.

    restore:enabled Restore is disabled by default. You must set this property to true.
    restore:snapshotTimestamp

    TIMESTAMP

    The timestamp of the backup snapshot to restore. To check what timestamps can be used, go to the dbStorageBucket and look at the files that are present in the bucket. Each file name contains a timestamp value such as the following:

    backup_20210203213003_apigee-cassandra-default-0.tgz

    Where 20210203213003 is the snapshotTimestamp value you would use if you wanted to restore the backups created at that point in time.

    restore:serviceAccountPath

    SA_JSON_FILE_PATH

    The path on your filesystem to the service account you created for the backup.

    restore:dbStorageBucket

    CLOUD_STORAGE_BUCKET_PATH

    The Cloud Storage bucket path where your backup data is stored in the following format:

    gs://BUCKET_NAME

    The gs:// is required.

  5. Change the app label on any Cassandra nodes in the old namespace by executing the following command:
    kubectl label pods --overwrite --namespace=OLD_NAMESPACE -l app=apigee-cassandra app=apigee-cassandra-old
    
  6. Create a new hybrid runtime deployment. This will create a new Cassandra cluster and begin restoring the backup data into the cluster:
    ./apigeectl init  -f ../overrides-restore.yaml
    
    ./apigeectl apply  -f ../overrides-restore.yaml
    
  7. Once the restoration is complete, the traffic must be switched to use the Cassandra cluster in the new namespace. Run the following commands to switch the traffic:

    kubectl get rs -n OLD_NAMESPACE # look for the 'apigee-connect' replicaset
    
    kubectl patch rs -n OLD_NAMESPACE APIGEE_CONNECT_RS_NAME -p '{"spec":{"replicas" : 0}}'
    
  8. Once the traffic switch is complete, you can reconfigure backups on the restored cluster by removing the restore configuration and adding the backup configuration to the overrides-restore.yaml file. Replace YOUR_RESTORE_NAMESPACE with the new namespace name created in step 1.
    namespace: YOUR_RESTORE_NAMESPACE
    cassandra:
      ...
       backup:
        enabled: true
        serviceAccountPath: SA_JSON_FILE_PATH
        dbStorageBucket: CLOUD_STORAGE_BUCKET_PATH
        schedule: BACKUP_SCHEDULE_CODE
      ...

    Then apply the backup configuration with the following command:

    ./apigeectl apply  -f ../overrides-restore.yaml
    

Viewing the restore logs

You can check the restore job logs and use grep to check for error to make sure the restore log has no errors.

Verify the restore completed

Use the following command to check if the restore operation completed:

kubectl get pods

The output is similar to the following:

NAME                           READY     STATUS      RESTARTS   AGE
apigee-cassandra-default-0     1/1       Running     0          1h
apigee-cassandra-default-1     1/1       Running     0          1h
apigee-cassandra-default-2     1/1       Running     0          59m
apigee-cassandra-restore-b4lgf 0/1       Completed   0          51m

View the restore logs

Use the following command to view the restore logs:

kubectl logs -f apigee-cassandra-restore-b4lgf

The output is similar to the following:

Restore Logs:

Activated service account credentials for: [apigee-cassandra-backup-svc@gce-myusername.iam.gserviceaccount.com]
to download file gs://gce-myusername-apigee-cassandra-backup/apigeecluster/dc-1/backup_20190405011309_schema.tgz
INFO: download successfully extracted the backup files from gs://gce-myusername-apigee-cassandra-backup/apigeecluster/dc-1
finished downloading schema.cql
to create schema from 10.32.0.28

Warnings :
dclocal_read_repair_chance table option has been deprecated and will be removed in version 4.0

dclocal_read_repair_chance table option has been deprecated and will be removed in version 4.0


Warnings :
dclocal_read_repair_chance table option has been deprecated and will be removed in version 4.0

dclocal_read_repair_chance table option has been deprecated and will be removed in version 4.0

INFO: the schema has been restored
starting apigee-cassandra-default-0 in default
starting apigee-cassandra-default-1 in default
starting apigee-cassandra-default-2 in default
84 95 106
waiting on waiting nodes $pid to finish  84
Activated service account credentials for: [apigee-cassandra-backup-svc@gce-myusername.iam.gserviceaccount.com]
Activated service account credentials for: [apigee-cassandra-backup-svc@gce-myusername.iam.gserviceaccount.com]
Activated service account credentials for: [apigee-cassandra-backup-svc@gce-myusername.iam.gserviceaccount.com]
INFO: restore downloaded  tarball and extracted the file from  gs://gce-myusername-apigee-cassandra-backup/apigeecluster/dc-1
INFO: restore downloaded  tarball and extracted the file from  gs://gce-myusername-apigee-cassandra-backup/apigeecluster/dc-1
INFO: restore downloaded  tarball and extracted the file from  gs://gce-myusername-apigee-cassandra-backup/apigeecluster/dc-1
INFO  12:02:28 Configuration location: file:/etc/cassandra/cassandra.yaml
...

INFO  12:02:41 [Stream #e013ee80-5863-11e9-8458-353e9e3cb7f9] All sessions completed

Summary statistics:
   Connections per host    : 3
   Total files transferred : 2
   Total bytes transferred : 0.378KiB
   Total duration          : 5048 ms
   Average transfer rate   : 0.074KiB/s
   Peak transfer rate      : 0.075KiB/s

progress: [/10.32.1.155]0:1/1 100% 1:1/1 100% [/10.32.0.28]1:1/1 100% 0:1/1 100% [/10.32.3.220]0:1/1 100% 1:1/1 100% total: 100% 0.000KiB/s (avg: 0.074KiB/s)
INFO  12:02:41 [Stream #e013ee80-5863-11e9-8458-353e9e3cb7f9] All sessions completed
progress: [/10.32.1.155]0:1/1 100% 1:1/1 100% [/10.32.0.28]1:1/1 100% 0:1/1 100% [/10.32.3.220]0:1/1 100% 1:1/1 100% total: 100% 0.000KiB/s (avg: 0.074KiB/s)
INFO  12:02:41 [Stream #e013ee80-5863-11e9-8458-353e9e3cb7f9] All sessions completed
INFO  12:02:41 [Stream #e013ee80-5863-11e9-8458-353e9e3cb7f9] All sessions completed
INFO: ./apigee/data/cassandra/data/ks1/user-9fbae960571411e99652c7b15b2db6cc restored successfully
INFO: Restore 20190405011309 completed
INFO: ./apigee/data/cassandra/data/ks1/user-9fbae960571411e99652c7b15b2db6cc restored successfully
INFO: Restore 20190405011309 completed
waiting on waiting nodes $pid to finish  106
Restore finished

Verify backup job

You can also verify your backup job after your backup cronjob is scheduled. After the cronjob has been scheduled, you should see something like this:

kubectl get pods

The output is similar to the following:

NAME                                       READY     STATUS      RESTARTS   AGE
apigee-cassandra-default-0                 1/1       Running     0          2h
apigee-cassandra-default-1                 1/1       Running     0          2h
apigee-cassandra-default-2                 1/1       Running     0          2h
apigee-cassandra-backup-1554515580-pff6s   0/1       Running     0          54s

Check the backup logs

The backup job:

  • Creates a schema.cql file.
  • Uploads it to your storage bucket.
  • Echoes the node to backup the data and uploads it at the same time.
  • Waits until all of the data is uploaded.
kubectl logs -f apigee-cassandra-backup-1554515580-pff6s

The output is similar to the following:

myusername-macbookpro:cassandra-backup-utility myusername$ kubectl logs -f apigee-cassandra-backup-1554577680-f9sc4
starting apigee-cassandra-default-0 in default
starting apigee-cassandra-default-1 in default
starting apigee-cassandra-default-2 in default
35 46 57
waiting on process  35
Activated service account credentials for: [apigee-cassandra-backup-svc@gce-myusername.iam.gserviceaccount.com]
Activated service account credentials for: [apigee-cassandra-backup-svc@gce-myusername.iam.gserviceaccount.com]
Activated service account credentials for: [apigee-cassandra-backup-svc@gce-myusername.iam.gserviceaccount.com]
Requested creating snapshot(s) for [all keyspaces] with snapshot name [20190406190808] and options {skipFlush=false}
Snapshot directory: 20190406190808
INFO: backup created cassandra snapshot 20190406190808
tar: Removing leading `/' from member names
/apigee/data/cassandra/data/ks1/mytest3-37bc2df0587811e98e8d875b0ed64754/snapshots/
/apigee/data/cassandra/data/ks1/mytest3-37bc2df0587811e98e8d875b0ed64754/snapshots/20190406190808/
/apigee/data/cassandra/data/ks1/mytest3-37bc2df0587811e98e8d875b0ed64754/snapshots/20190406190808/mc-1-big-Data.db
Requested creating snapshot(s) for [all keyspaces] with snapshot name [20190406190808] and options {skipFlush=false}
Requested creating snapshot(s) for [all keyspaces] with snapshot name [20190406190808] and options {skipFlush=false}
Snapshot directory: 20190406190808
INFO: backup created cassandra snapshot 20190406190808
tar: Removing leading `/' from member names
/apigee/data/cassandra/data/system/hints-2666e20573ef38b390fefecf96e8f0c7/snapshots/
/apigee/data/cassandra/data/system/hints-2666e20573ef38b390fefecf96e8f0c7/snapshots/20190406190808/
/apigee/data/cassandra/data/system/hints-2666e20573ef38b390fefecf96e8f0c7/snapshots/20190406190808/manifest.json
/apigee/data/cassandra/data/system/prepared_statements-18a9c2576a0c3841ba718cd529849fef/snapshots/
/apigee/data/cassandra/data/system/prepared_statements-18a9c2576a0c3841ba718cd529849fef/snapshots/20190406190808/
/apigee/data/cassandra/data/system/prepared_statements-18a9c2576a0c3841ba718cd529849fef/snapshots/20190406190808/manifest.json
/apigee/data/cassandra/data/system/range_xfers-55d764384e553f8b9f6e676d4af3976d/snapshots/
/apigee/data/cassandra/data/system/range_xfers-55d764384e553f8b9f6e676d4af3976d/snapshots/20190406190808/
/apigee/data/cassandra/data/system/range_xfers-55d764384e553f8b9f6e676d4af3976d/snapshots/20190406190808/manifest.json
/apigee/data/cassandra/data/system/peer_events-59dfeaea8db2334191ef109974d81484/snapshots/
/apigee/data/cassandra/data/system/peer_events-59dfeaea8db2334191ef109974d81484/snapshots/20190406190808/
/apigee/data/cassandra/data/system/peer_events-59dfeaea8db2334191ef109974d81484/snapshots/20190406190808/manifest.json
/apigee/data/cassandra/data/system/built_views-4b3c50a9ea873d7691016dbc9c38494a/snapshots/
/apigee/data/cassandra/data/system/built_views-4b3c50a9ea873d7691016dbc9c38494a/snapshots/20190406190808/
/apigee/data/cassandra/data/system/built_views-4b3c50a9ea873d7691016dbc9c38494a/snapshots/20190406190808/manifest.json
……
/apigee/data/cassandra/data/ks2/user-d6d39d70586311e98e8d875b0ed64754/snapshots/
/apigee/data/cassandra/data/ks2/user-d6d39d70586311e98e8d875b0ed64754/snapshots/20190406190808/
/apigee/data/cassandra/data/ks2/user-d6d39d70586311e98e8d875b0ed64754/snapshots/20190406190808/mc-3-big-Filter.db
/apigee/data/cassandra/data/ks2/user-d6d39d70586311e98e8d875b0ed64754/snapshots/20190406190808/mc-2-big-CompressionInfo.db
/apigee/data/cassandra/data/ks2/user-d6d39d70586311e98e8d875b0ed64754/snapshots/20190406190808/mc-2-big-Index.db
/apigee/data/cassandra/data/ks2/user-d6d39d70586311e98e8d875b0ed64754/snapshots/20190406190808/mc-3-big-Statistics.db
/apigee/data/cassandra/data/ks2/user-d6d39d70586311e98e8d875b0ed64754/snapshots/20190406190808/mc-2-big-Data.db
/apigee/data/cassandra/data/ks2/user-d6d39d70586311e98e8d875b0ed64754/snapshots/20190406190808/mc-1-big-Index.db
/apigee/data/cassandra/data/ks2/user-d6d39d70586311e98e8d875b0ed64754/snapshots/20190406190808/mc-1-big-Statistics.db
/apigee/data/cassandra/data/ks2/user-d6d39d70586311e98e8d875b0ed64754/snapshots/20190406190808/mc-2-big-TOC.txt
/apigee/data/cassandra/data/ks2/user-d6d39d70586311e98e8d875b0ed64754/snapshots/20190406190808/mc-2-big-Statistics.db
/apigee/data/cassandra/data/ks2/user-d6d39d70586311e98e8d875b0ed64754/snapshots/20190406190808/mc-1-big-Summary.db
/apigee/data/cassandra/data/ks2/user-d6d39d70586311e98e8d875b0ed64754/snapshots/20190406190808/mc-1-big-Filter.db
/apigee/data/cassandra/data/ks2/user-d6d39d70586311e98e8d875b0ed64754/snapshots/20190406190808/mc-2-big-Summary.db
/apigee/data/cassandra/data/ks2/user-d6d39d70586311e98e8d875b0ed64754/snapshots/20190406190808/mc-3-big-Index.db
/apigee/data/cassandra/data/ks2/user-d6d39d70586311e98e8d875b0ed64754/snapshots/20190406190808/manifest.json
/apigee/data/cassandra/data/ks2/user-d6d39d70586311e98e8d875b0ed64754/snapshots/20190406190808/mc-2-big-Filter.db
/apigee/data/cassandra/data/ks2/user-d6d39d70586311e98e8d875b0ed64754/snapshots/20190406190808/mc-2-big-Digest.crc32
/apigee/data/cassandra/data/ks2/user-d6d39d70586311e98e8d875b0ed64754/snapshots/20190406190808/mc-3-big-Summary.db
/apigee/data/cassandra/data/ks2/user-d6d39d70586311e98e8d875b0ed64754/snapshots/20190406190808/mc-3-big-Data.db
/apigee/data/cassandra/data/ks2/user-d6d39d70586311e98e8d875b0ed64754/snapshots/20190406190808/mc-1-big-TOC.txt
/apigee/data/cassandra/data/ks2/user-d6d39d70586311e98e8d875b0ed64754/snapshots/20190406190808/schema.cql
/apigee/data/cassandra/data/ks2/user-d6d39d70586311e98e8d875b0ed64754/snapshots/20190406190808/mc-3-big-CompressionInfo.db
/apigee/data/cassandra/data/ks2/user-d6d39d70586311e98e8d875b0ed64754/snapshots/20190406190808/mc-1-big-Digest.crc32
/apigee/data/cassandra/data/ks2/user-d6d39d70586311e98e8d875b0ed64754/snapshots/20190406190808/mc-3-big-TOC.txt
/apigee/data/cassandra/data/ks2/user-d6d39d70586311e98e8d875b0ed64754/snapshots/20190406190808/mc-1-big-Data.db
/apigee/data/cassandra/data/ks2/user-d6d39d70586311e98e8d875b0ed64754/snapshots/20190406190808/mc-3-big-Digest.crc32
/apigee/data/cassandra/data/ks2/user-d6d39d70586311e98e8d875b0ed64754/snapshots/20190406190808/mc-1-big-CompressionInfo.db
……
/tmp/tokens.txt
/ [1 files][    0.0 B/    0.0 B]
Operation completed over 1 objects.
/ [1 files][    0.0 B/    0.0 B]
Operation completed over 1 objects.
INFO: backup created tarball and transferred the file to gs://gce-myusername-apigee-cassandra-backup/apigeecluster/dc-1
INFO: removing cassandra snapshot
INFO: backup created tarball and transferred the file to gs://gce-myusername-apigee-cassandra-backup/apigeecluster/dc-1
INFO: removing cassandra snapshot
Requested clearing snapshot(s) for [all keyspaces]
INFO: Backup 20190406190808 completed
waiting on process  46
Requested clearing snapshot(s) for [all keyspaces]
INFO: Backup 20190406190808 completed
Requested clearing snapshot(s) for [all keyspaces]
waiting on process  57
INFO: Backup 20190406190808 completed
waiting result
to get schema from 10.32.0.28
INFO: /tmp/schema.cql has been generated
Activated service account credentials for: [apigee-cassandra-backup-svc@gce-myusername.iam.gserviceaccount.com]
tar: removing leading '/' from member names
tmp/schema.cql
Copying from <TDIN>...
/ [1 files][    0.0 B/    0.0 B]
Operation completed over 1 objects.
INFO: backup created tarball and transferred the file to gs://gce-myusername-apigee-cassandra-backup/apigeecluster/dc-1
finished uploading schema.cql