The health check feature regularly monitors the health of the cluster control plane and several critical components, and helps you detect and diagnose potential problems with your clusters.
If you need additional assistance, reach out to Cloud Customer Care.Issues detected
The cluster health checker detects and alerts you to the following issues in a cluster:
kube-scheduler
health on control plane nodes: If thekube-scheduler
is unhealthy, this suggests that the cluster is having trouble assigning Pods to nodes. To investigate further, you can examine thekube-scheduler
Pod log.kube-controller-manager
health on control plane nodes: Thekube-controller-manager
monitors various controllers, such as the ReplicaSet, Deployment, and Namespace controllers, among others. If thekube-controller-manager
is deemed unhealthy, this suggests that one or more of the controllers it manages might not be working properly. To determine the precise issue, you can examine thekube-controller-manager
Pod log, which might provide more information about the malfunctioning controller(s).Root volume capacity: The health checker checks for sufficient capacity on the root volume of each control plane node. If the available capacity falls under 512MB, the health checker alerts you to the potential risk of running out of disk space.
View health check events
To view alerts from the health checker for a specific cluster, run the following command:
gcloud container aws clusters describe CLUSTER_NAME \
--location GOOGLE_CLOUD_LOCATION
Replace the following:
CLUSTER_NAME
: your cluster's nameGOOGLE_CLOUD_LOCATION
: the name of the Google Cloud location that manages the cluster
Here's an excerpt of the kind of output you can expect:
{ "name": "some-cluster-name", "description": "test-cluster", ... "errors": [ { "message": "Replica (replica-name)": kube-controller-manager is unhealthy" }, { "message": "Replica (replica-name)": not enough disk space on root volume, only 9 MB left" } ] ... }
In this example, the error message indicates that a kube-controller-manager
component is unhealthy, and that the capacity on a control plane node's root
volume is getting low.