This page explains how to enable permissive mode on a backup plan.
During backup execution, if Backup for GKE detects conditions that are likely to cause restore to fail, the backup itself fails. The reason for the failure is provided in the backup's state_reason field. In the Google Cloud console, this field is termed as Status reason.
If you enable permissive mode, the description of the issue is still provided in the Status reason field, but the backup won't fail. You can enable this behavior if you are aware of the issue and are prepared to employ a workaround at restore time.
The following is an example of an error message that you might see in the
backup's Status reason field that suggests enabling permissive mode:
If you cannot implement the recommended fix, you may create a new backup with
Permissive Mode enabled.
gcloud
Enable permissive mode:
gcloud beta container backup-restore backup-plans update BACKUP_PLAN \
--project=PROJECT_ID \
--location=LOCATION
--permissive-mode
Replace the following:
BACKUP_PLAN
: the name of the backup plan that you want to update.PROJECT_ID
: the ID of your Google Cloud project.LOCATION
: the compute region for the resource, for exampleus-central1
. See About resource locations.For a full list of options, refer to the gcloud beta container backup-restore backup-plans update documentation.
Console
Use the following instructions to enable permissive mode in the Google Cloud console:
In the Google Cloud console, go to the Google Kubernetes Engine page.
In the navigation menu, click Backup for GKE.
Click the Backup plans tab.
Expand the cluster and click the plan name.
Click the Details tab to edit the plan details.
Click Edit to edit the section with Backup mode.
Click the Permissive mode checkbox and click Save changes.
Terraform
Update the existing google_gke_backup_backup_plan
resource.
resource "google_gke_backup_backup_plan" "NAME" {
...
backup_config {
permissive_mode = true
...
}
}
Replace the following:
NAME
: the name of thegoogle_gke_backup_backup_plan
that you want to update.
For more information, see gke_backup_backup_plan.
Troubleshoot backup failures
The following table provides explanations and recommended actions for various backup failure messages displayed in the backup's Status reason field.
Backup failure message | Message description and failure reason | Recommended action |
---|---|---|
|
Description: A Custom Resource Definition (CRD) in
the cluster was originally applied as
apiextensions.k8s.io/v1beta1 and lacks a structural schema
required in apiextensions.k8s.io/v1 .Reason: Backup for GKE cannot automatically define the structural schema. Restoring the CRD in Kubernetes v1.22+ clusters, where apiextensions.k8s.io/v1beta1 is not available, causes
the restore to fail. This failure happens when restoring custom
resources defined by the CRD.
|
We recommend you to use the following options:
When permissive mode is enabled, the CRD without a structural schema won't be backed up in a Kubernetes v1.22+ cluster. To successfully restore such a backup, you need to exclude the resources served by the CRD from restore or create the CRD in the target cluster before starting the restore. |
|
Description: In the source cluster, a PVC is bound to
a PV that is not a Persistent Disk volume. Reason: Backup for GKE only supports backing up Persistent Disk volume data. Non-Persistent Disk PVCs restored using the Provision new volumes and restore volume data from backup policy will not have any volume data restored. However, the Reuse existing volumes containing your data policy allows PVCs to be reconnected to the original volume handle. This is useful for volume types that are backed by an external server, like NFS. |
Enable permissive mode with an understanding of the available restore
options for the non-Persistent Disk volumes in the source cluster. For
backing up Filestore volumes, see
Handle Filestore volumes with Backup for GKE. When permissive mode is enabled, the PVC configuration is backed up, but the volume data is not. |
|
Description: A PVC in the cluster is not bound to a PV.
Reason: Backup for GKE can back up the PVC, but there is no volume data to back up. This situation might indicate a misconfiguration or a mismatch between requested and available storage. |
Check if the unbound PVC is in an acceptable condition. If it is,
enable permissive mode. Be aware of the implications for backup
behavior. When permissive mode is enabled, the PVC configuration is backed up, but there is no volume data to be backed up. |
|
Description: An API service in the cluster is
misconfigured. This causes requests to the API path to return "Failed to
query API resources." The underlying service may not exist or may not be
ready yet. Reason: Backup for GKE is unable to back up any resources served by the unavailable API. |
Check the underlying service in the API service's
spec.service to make sure it is ready.When permissive mode is enabled, resources from the API groups that failed to load won't be backed up. |
|
Description: In Kubernetes v1.23 and earlier, service
accounts automatically generate a token backed by a secret. However, in
later versions, Kubernetes removed this auto-generated token feature. A
Pod in the cluster might have mounted the secret volume to its
containers' file system. Reason: If Backup for GKE attempts to restore a service account along with its auto-generated secret and a Pod that mounts the secret volume, the restore appears to be successful. However, Kubernetes removes the secret, which causes the Pod to get stuck in container creation and fail to start. |
Define the spec.serviceAccountName field in the Pod. This
action ensures that the token is automatically mounted on
/var/run/secrets/kubernetes.io/serviceaccount in the
containers. For more information, refer to
Configure Service Accounts for Pods
documentation.When permissive mode is enabled, the secret is backed up but can't be mounted in Pods in Kubernetes v1.24+ clusters. |
Common CRDs with issues and recommended actions
Here are some common CRDs that have backup issues and the actions we recommend to address the issues:
capacityrequests.internal.autoscaling.k8s.io
: This CRD was used temporarily in v1.21 clusters. Runkubectl delete crd capacityrequests.internal.autoscaling.k8s.io
to remove the CRD.scalingpolicies.scalingpolicy.kope.io
: This CRD was used to control fluentd resources, but GKE has migrated to using fluentbit. Runkubectl delete crd scalingpolicies.scalingpolicy.kope.io
to remove the CRD.memberships.hub.gke.io
: Runkubectl delete crd memberships.hub.gke.io
to remove the CRD if there are no membership resources. Enable permissive mode if there are membership resources.applications.app.k8s.io
: Enable permissive mode with an understanding of restore behavior.