Monitor instances with Cloud Monitoring

This document describes how to use the Cloud Monitoring console to monitor your Spanner instances.

The Cloud Monitoring console provides several monitoring tools for Spanner:

A curated dashboard, which shows pre-made charts for your Spanner resources
Custom charts, including ad-hoc charts in the Metrics Explorer as well as charts in custom dashboards
Alerts, which notify you if a metric exceeds a threshold that you specify

If you prefer to monitor Spanner programmatically, use the Cloud Client Libraries for Cloud Monitoring to retrieve metrics.

Use the Cloud Monitoring curated dashboard

Cloud Monitoring provides you with a curated dashboard that summarizes key information about your Spanner instances, including:

Incidents: User-created monitoring alerts that are open, active, or resolved
Events: A list of Spanner audit logs (if enabled and available)
Instances: A high-level summary of your Spanner instances, including compute capacity, database count, and instance health
Aggregated charts of throughput and storage use

To view the Spanner dashboard, do the following:

In the Google Cloud console, select Monitoring, or use the following button:

Go to Monitoring
If Resources is shown in the navigation pane, then select Resources and then select Cloud Spanner. Otherwise, select Dashboards and then select the dashboard named Cloud Spanner.

View instance and database details

When you open the curated dashboard for Spanner, it shows aggregated data for all of your instances. You can view more details about a specific instance by clicking the instance's name under Instances.

The dashboard displays information such as instance metadata, databases in the instance, and charts of various metrics broken down by region.

From the instance dashboard page, you can also see charts for a specific database in the instance:

On the right-hand side, above the instance metrics charts, click Database metrics.
In the Select a breakdown drop-down list, select the database that you want to examine.

The Cloud Monitoring console displays charts for the database.

Create custom charts for Spanner metrics

You can use Cloud Monitoring to create custom charts for Spanner metrics. You can use the Metrics Explorer to create temporary, ad-hoc charts, or you can create charts that appear on custom dashboards.

In particular, Cloud Monitoring lets you create a custom chart that shows whether two or more metrics are correlated with each other. For example, you can check for a correlation between CPU utilization and latency in a Spanner instance, which might indicate that your instance needs more compute capacity or that some of your queries are causing high CPU utilization.

To get started with this example, follow these steps:

In the Google Cloud console, select Monitoring, or use the following button:

Go to Monitoring
If Metrics Explorer is shown in the navigation pane, select it. Otherwise, select Resources and then select Metrics Explorer.
Click the View options tab, then select the Log scale on Y-axis checkbox. This option helps you compare multiple metrics when one metric has much larger values than the others.
In the drop-down list above the right pane, select Line.
Click the Metrics tab. You can now add metrics to the chart.

To add latency metrics to the chart, follow these steps:

In the Find resource type and metric box, enter the value spanner.googleapis.com/api/request_latencies, then click the row that appears below the box.
In the Filter box, enter the value instance_id, then enter the instance ID you want to examine and click Apply.
In the Aggregator drop-down list, click max.
Optional: Change the latency percentile:
1. Click Show advanced options.
2. Click the Aligner drop-down list, then click the latency percentile that you want to view.
  
  In most cases, you should look at either the 50th percentile latency, to understand the typical amount of latency, or the 99th percentile latency, to understand the latency for the slowest 1% of requests.

To add CPU utilization metrics to the chart, follow these steps:

Click Add metric.
In the Find resource type and metric box, enter the value spanner.googleapis.com/instance/cpu/utilization, then click the row that appears below the box.
In the Filter box, enter the value instance_id, then enter the instance ID you want to examine and click Apply.
In the Aggregator drop-down list, click max.

You now have a chart that shows the CPU utilization and latency metrics for a Spanner instance. If both metrics are higher than expected at the same time, you can take additional steps to correct the issue.

For more information about creating custom charts, see the Cloud Monitoring documentation.

Create alerts for Spanner metrics

When you create a Spanner instance, you choose the compute capacity for the instance. As the instance's workload changes, Spanner does not automatically adjust compute capacity of the instance. As a result, you need to set up several alerts to ensure that the instance stays within the recommended maximums for CPU utilization and the recommended limit for storage.

The following examples show how to set up alerting policies for some Spanner metrics. For a full list of available metrics, see metrics list for Spanner.

High-priority CPU

To create an alerting policy that triggers when your high priority cpu utilization for Spanner is above a recommended threshold, use the following settings.

Steps to create an alerting policy.

To create an alerting policy, do the following:

In the Google Cloud console, go to the Alerting page:
Go to Alerting

If you use the search bar to find this page, then select the result whose subheading is Monitoring.
If you haven't created your notification channels and if you want to be notified, then click Edit Notification Channels and add your notification channels. Return to the Alerting page after you add your channels.
From the Alerting page, select Create policy.
To select the resource, metric, and filters, expand the Select a metric menu and then use the values in the New condition table:
1. Optional: To limit the menu to relevant entries, enter the resource or metric name in the filter bar.
2. Select a Resource type. For example, select VM instance.
3. Select a Metric category. For example, select instance.
4. Select a Metric. For example, select CPU Utilization.
5. Select Apply.
Click Next and then configure the alerting policy trigger. To complete these fields, use the values in the Configure alert trigger table.
Click Next.
Optional: To add notifications to your alerting policy, click Notification channels. In the dialog, select one or more notification channels from the menu, and then click OK.

To be notified when incidents are openend and closed, check Notify on incident closure. By default, notifications are sent only when incidents are openend.
Optional: Update the Incident autoclose duration. This field determines when Monitoring closes incidents in the absence of metric data.
Optional: Click Documentation, and then add any information that you want included in a notification message.
Click Alert name and enter a name for the alerting policy.
Click Create Policy.

New condition Field	Value
Resource and Metric	In the Resources menu, select Spanner Instance. In the Metric categories menu, select Instance. In the Metrics menu, select CPU Utilization by priority. (The metric.type is `spanner.googleapis.com/instance/cpu/utilization_by_priority`).
Filter	`instance_id = YOUR_INSTANCE_ID` `priority = high`
Across time series Time series group by	`location` for multi-region instances; leave it blank for regional instances.
Across time series Time series aggregation	`sum`
Rolling window	`10 m`
Rolling window function	`mean`

Configure alert trigger Field	Value
Condition type	`Threshold`
Alert trigger	`Any time series violates`
Threshold position	`Above threshold`
Threshold value	`45%` for multi-region instances; `65%` for regional instances.
Retest window	`10 minutes`

24 hour rolling average CPU

To create an alerting policy that triggers when the 24 hour rolling average of your cpu utilization for Spanner is above a recommended threshold, use the following settings.

Steps to create an alerting policy.

To create an alerting policy, do the following:

In the Google Cloud console, go to the Alerting page:
Go to Alerting

If you use the search bar to find this page, then select the result whose subheading is Monitoring.
If you haven't created your notification channels and if you want to be notified, then click Edit Notification Channels and add your notification channels. Return to the Alerting page after you add your channels.
From the Alerting page, select Create policy.
To select the resource, metric, and filters, expand the Select a metric menu and then use the values in the New condition table:
1. Optional: To limit the menu to relevant entries, enter the resource or metric name in the filter bar.
2. Select a Resource type. For example, select VM instance.
3. Select a Metric category. For example, select instance.
4. Select a Metric. For example, select CPU Utilization.
5. Select Apply.
Click Next and then configure the alerting policy trigger. To complete these fields, use the values in the Configure alert trigger table.
Click Next.
Optional: To add notifications to your alerting policy, click Notification channels. In the dialog, select one or more notification channels from the menu, and then click OK.

To be notified when incidents are openend and closed, check Notify on incident closure. By default, notifications are sent only when incidents are openend.
Optional: Update the Incident autoclose duration. This field determines when Monitoring closes incidents in the absence of metric data.
Optional: Click Documentation, and then add any information that you want included in a notification message.
Click Alert name and enter a name for the alerting policy.
Click Create Policy.

New condition Field	Value
Resource and Metric	In the Resources menu, select Spanner Instance. In the Metric categories menu, select Instance. In the Metrics menu, select Smoothed CPU utilization. (The metric.type is `spanner.googleapis.com/instance/cpu/smoothed_utilization`).
Filter	`instance_id = YOUR_INSTANCE_ID`
Across time series Time series aggregation	`sum`
Rolling window	`10 m`
Rolling window function	`mean`

Configure alert trigger Field	Value
Condition type	`Threshold`
Alert trigger	`Any time series violates`
Threshold position	`Above threshold`
Threshold	`90%`
Retest window	`10 minutes`

Storage

To create an alerting policy that triggers when your storage for your Spanner instance is above a recommended threshold, use the following settings.

Steps to create an alerting policy.

To create an alerting policy, do the following:

In the Google Cloud console, go to the Alerting page:
Go to Alerting

If you use the search bar to find this page, then select the result whose subheading is Monitoring.
If you haven't created your notification channels and if you want to be notified, then click Edit Notification Channels and add your notification channels. Return to the Alerting page after you add your channels.
From the Alerting page, select Create policy.
To select the resource, metric, and filters, expand the Select a metric menu and then use the values in the New condition table:
1. Optional: To limit the menu to relevant entries, enter the resource or metric name in the filter bar.
2. Select a Resource type. For example, select VM instance.
3. Select a Metric category. For example, select instance.
4. Select a Metric. For example, select CPU Utilization.
5. Select Apply.
Click Next and then configure the alerting policy trigger. To complete these fields, use the values in the Configure alert trigger table.
Click Next.
Optional: To add notifications to your alerting policy, click Notification channels. In the dialog, select one or more notification channels from the menu, and then click OK.

To be notified when incidents are openend and closed, check Notify on incident closure. By default, notifications are sent only when incidents are openend.
Optional: Update the Incident autoclose duration. This field determines when Monitoring closes incidents in the absence of metric data.
Optional: Click Documentation, and then add any information that you want included in a notification message.
Click Alert name and enter a name for the alerting policy.
Click Create Policy.

New condition Field	Value
Resource and Metric	In the Resources menu, select Spanner Instance. In the Metric categories menu, select Instance. In the Metrics menu, select Storage used. (The metric.type is `spanner.googleapis.com/instance/storage/utilization`).
Filter	`instance_id = YOUR_INSTANCE_ID`
Across time series Time series aggregation	`sum`
Rolling window	`10 m`
Rolling window function	`max`

Configure alert trigger Field	Value
Condition type	`Threshold`
Condition triggers if	`Any time series violates`
Threshold position	`Above threshold`
Threshold value	You don't need to set a specific threshold for the maximum storage per node. However, we recommended that you set up an alert for when you are approaching the maximum storage limit. To learn more, see Storage utilization metrics.
Retest window	`10 minutes`

What's next

Understand the CPU utilization and latency metrics for Spanner.
Use the Google Cloud console to get a quick view of the most important metrics for your instance.
Learn more about Cloud Monitoring.