Creating dashboards

This document shows how to create a set of recommended custom dashboards for monitoring your clusters.

Permissions for creating dashboards

To create dashboards, your Google Account must have the following permissions to create dashboards:

  • monitoring.dashboards.create
  • monitoring.dashboards.delete
  • monitoring.dashboards.update

You'll have these permissions if your account has one of the following roles. You can check your permissions (in the Google Cloud console):

  • monitoring.dashboardEditor
  • monitoring.editor
  • Project editor
  • Project owner

In addition, to use gcloud (gcloud CLI) to create dashboards, your Google Account must have the serviceusage.services.use permission.

Your account will have this permission if it has one of the following roles:

  • roles/serviceusage.serviceUsageConsumer
  • roles/serviceusage.serviceUsageAdmin
  • roles/owner
  • roles/editor
  • Project editor
  • Project owner

After you create a cluster (admin or user), a best practice is to create the following dashboards with Cloud Monitoring to let your Google Distributed Cloud operations team monitor cluster health:

If your cluster also runs Windows Server OS nodes, you can create the following dashboards to monitor the status of Windows nodes and Pods:

  • Windows node status dashboard
  • Windows pod status dashboard

The dashboards are automatically created during admin cluster installation if Cloud Monitoring is enabled.

This section describes how to create these dashboards. For more information about the dashboard creation process described in the following sections, see Managing dashboards by API.

Create a control plane uptime dashboard

The Google Distributed Cloud control plane consists of the API server, scheduler, controller manager, and etcd. To monitor the status of the control plane, create a dashboard that monitors the state of these components.

  1. Download the dashboard configuration: control-plane-uptime.json.

  2. Create a custom dashboard with the configuration file by running the following command:

    gcloud monitoring dashboards create --config-from-file=control-plane-uptime.json
  3. In the Google Cloud console, select Monitoring, or use the following button:

    Go to Monitoring

  4. Select Resources > Dashboards and view the dashboard named GKE on-prem control plane uptime. The control plane uptime of each user cluster is collected from separate namespaces within the admin cluster. The namespace_name field is the user cluster name.

  5. Optionally create alerting policies.

Create a Pod status dashboard

To create a dashboard that includes the phase of each Pod, and the restart times and resource usage of each container, perform the following steps.

  1. Download the dashboard configuration: pod-status.json.

  2. Create a custom dashboard with the configuration file by running the following command:

    gcloud monitoring dashboards create --config-from-file=pod-status.json
  3. In the Google Cloud console, select Monitoring, or use the following button:

    Go to Monitoring

  4. Select Resources > Dashboards and view the dashboard named GKE on-prem pod status.

  5. Optionally create alerting policies.

Create a node status dashboard

To create an node status dashboard to monitor the node condition, CPU, memory and disk usage, perform the following steps:

  1. Download the dashboard configuration: node-status.json.

  2. Create a custom dashboard with the configuration file by running the following command:

    gcloud monitoring dashboards create --config-from-file=node-status.json
  3. In the Google Cloud console, select Monitoring, or use the following button:

    Go to Monitoring

  4. Select Resources > Dashboards and view the dashboard named GKE on-prem node status.

  5. Optionally create alerting policies.

Create a VM health status dashboard

A VM health status dashboard monitors CPU, memory, and disk resource contention signals for VMs in the admin cluster and user clusters.

To create an VM health status dashboard:

  1. Make sure stackdriver.disableVsphereResourceMetrics is set to false. See User cluster configuration file.

  2. Download the dashboard configuration: vm-health-status.json.

  3. Create a custom dashboard with the configuration file by running the following command:

    gcloud monitoring dashboards create --config-from-file=vm-health-status.json
  4. In the Google Cloud console, select Monitoring, or use the following button:

    Go to Monitoring

  5. Select Resources > Dashboards and view the dashboard named GKE on-prem VM health status.

  6. Optionally create alerting policies.

Create a node utilization dashboard

A node utilization dashboard shows the following utilization in the cluster:

  • Node CPU allocation ratio
  • Available vCPUs to schedule Kubernetes workload
  • Node memory allocation ratio
  • Available memory to schedule k8s workload
  • Node disk utilization ratio

To create a node utilization dashboard:

  1. Download the dashboard configuration: node-utilization.json.

  2. Use this configuration file to create a custom dashboard by running the following command:

    gcloud monitoring dashboards create --config-from-file=node-utilization.json
  3. In the Google Cloud console, select Monitoring, or use the following button:

    Go to Monitoring

  4. Select Resources > Dashboards and view the dashboard named GKE on-prem node utilization.

  5. Optionally create alerting policies.

Create an Anthos Utilization Metering dashboard

An Anthos Utilization Metering dashboard shows CPU and memory utilization in the clusters by namespace and Pod labels.

To create an Anthos Utilization Metering dashboard:

  1. Download the dashboard configuration: anthos-utilization.json.

  2. Use this configuration file to create a custom dashboard by running the following command:

    gcloud monitoring dashboards create --config-from-file=anthos-utilization.json
  3. In the Google Cloud console, select Monitoring, or use the following button:

    Go to Monitoring

  4. Select Resources > Dashboards and view the dashboard named Anthos Utilization Metering.

  5. Optionally create alerting policies.