This documentation is for the current version of GKE on AWS, released in November 2021. See the Release notes for more information.

Diagnose cluster issues

The health check feature regularly monitors the health of the cluster control plane and several critical components, and helps you detect and diagnose potential problems with your clusters.

If you need additional assistance, reach out to Cloud Customer Care.

Issues detected

The cluster health checker detects and alerts you to the following issues in a cluster:

kube-scheduler health on control plane nodes: If the kube-scheduler is unhealthy, this suggests that the cluster is having trouble assigning Pods to nodes. To investigate further, you can examine the kube-scheduler Pod log.
kube-controller-manager health on control plane nodes: The kube-controller-manager monitors various controllers, such as the ReplicaSet, Deployment, and Namespace controllers, among others. If the kube-controller-manager is deemed unhealthy, this suggests that one or more of the controllers it manages might not be working properly. To determine the precise issue, you can examine the kube-controller-manager Pod log, which might provide more information about the malfunctioning controller(s).
Root volume capacity: The health checker checks for sufficient capacity on the root volume of each control plane node. If the available capacity falls under 512MB, the health checker alerts you to the potential risk of running out of disk space.

View health check events

To view alerts from the health checker for a specific cluster, run the following command:

gcloud container aws clusters describe CLUSTER_NAME \
    --location GOOGLE_CLOUD_LOCATION

Replace the following:

CLUSTER_NAME: your cluster's name
GOOGLE_CLOUD_LOCATION: the name of the Google Cloud location that manages the cluster

Here's an excerpt of the kind of output you can expect:

{
  "name": "some-cluster-name",
  "description": "test-cluster",
  ...
  "errors": [
  {
    "message": "Replica (replica-name)": kube-controller-manager is unhealthy"
  },
  {
    "message": "Replica (replica-name)": not enough disk space on root volume, only 9 MB left"
  }
  ]
  ...
}

In this example, the error message indicates that a kube-controller-manager component is unhealthy, and that the capacity on a control plane node's root volume is getting low.

What's next

If you need additional assistance, reach out to Cloud Customer Care.