Enable permissive mode on a backup plan

Autopilot Standard

This page explains how to enable permissive mode on a backup plan.

During backup execution, if Backup for GKE detects conditions that are likely to cause restore to fail, the backup itself fails. The reason for the failure is provided in the backup's state_reason field. In the Google Cloud console, this field is termed as Status reason.

If you enable permissive mode, the description of the issue is still provided in the Status reason field, but the backup won't fail. You can enable this behavior if you are aware of the issue and are prepared to employ a workaround at restore time.

The following is an example of an error message that you might see in the backup's Status reason field that suggests enabling permissive mode: If you cannot implement the recommended fix, you may create a new backup with Permissive Mode enabled.

gcloud

Enable permissive mode:

gcloud beta container backup-restore backup-plans update BACKUP_PLAN \
    --project=PROJECT_ID \
    --location=LOCATION
    --permissive-mode

Replace the following:

BACKUP_PLAN: the name of the backup plan that you want to update.
PROJECT_ID: the ID of your Google Cloud project.
LOCATION: the compute region for the resource, for example us-central1. See About resource locations.

For a full list of options, refer to the gcloud beta container backup-restore backup-plans update documentation.

Console

Use the following instructions to enable permissive mode in the Google Cloud console:

In the Google Cloud console, go to the Google Kubernetes Engine page.

Go to Google Kubernetes Engine
In the navigation menu, click Backup for GKE.
Click the Backup plans tab.
Expand the cluster and click the plan name.
Click the Details tab to edit the plan details.
Click Edit to edit the section with Backup mode.
Click the Permissive mode checkbox and click Save changes.

Terraform

Update the existing google_gke_backup_backup_plan resource.

resource "google_gke_backup_backup_plan" "NAME" {
   ...
   backup_config {
     permissive_mode = true
     ...
   }
}

Replace the following:

NAME: the name of the google_gke_backup_backup_plan that you want to update.

For more information, see gke_backup_backup_plan.

Troubleshoot backup failures

The following table provides explanations and recommended actions for various backup failure messages displayed in the backup's Status reason field.

Backup failure message	Message description and failure reason	Recommended action
`CustomResourceDefinitions "..." have invalid schemas`	Description: A Custom Resource Definition (CRD) in the cluster was originally applied as `apiextensions.k8s.io/v1beta1` and lacks a structural schema required in `apiextensions.k8s.io/v1`. Reason: Backup for GKE cannot automatically define the structural schema. Restoring the CRD in Kubernetes v1.22+ clusters, where `apiextensions.k8s.io/v1beta1` is not available, causes the restore to fail. This failure happens when restoring custom resources defined by the CRD.	We recommend you to use the following options: If you manage the CRD, follow the steps in the Kubernetes documentation to specify a structural schema for your CRD. If it's a GKE-managed CRD, you can call `kubectl delete crd` if there are no existing resources served by the CRD. If there are existing resources served by the CRD, you can enable permissive mode with an understanding of the restore behavior. For recommendations on common CRDs, see the documentation. If it's a third-party CRD, consult the relevant documentation to migrate to `apiextensions.k8s.io/v1`. When permissive mode is enabled, the CRD without a structural schema won't be backed up in a Kubernetes v1.22+ cluster. To successfully restore such a backup, you need to exclude the resources served by the CRD from restore or create the CRD in the target cluster before starting the restore.
`PersistentVolumeClaims "..." are bound to PersistentVolumes of unsupported types "..." and cannot be backed up`	Description: In the source cluster, a PVC is bound to a PV that is not a Persistent Disk volume. Reason: Backup for GKE only supports backing up Persistent Disk volume data. Non-Persistent Disk PVCs restored using the Provision new volumes and restore volume data from backup policy will not have any volume data restored. However, the Reuse existing volumes containing your data policy allows PVCs to be reconnected to the original volume handle. This is useful for volume types that are backed by an external server, like NFS.	Enable permissive mode with an understanding of the available restore options for the non-Persistent Disk volumes in the source cluster. For backing up Filestore volumes, see Handle Filestore volumes with Backup for GKE. When permissive mode is enabled, the PVC configuration is backed up, but the volume data is not.
`PersistentVolumeClaims "..." are not bound to PersistentVolumes and cannot be backed up`	Description: A PVC in the cluster is not bound to a PV. Reason: Backup for GKE can back up the PVC, but there is no volume data to back up. This situation might indicate a misconfiguration or a mismatch between requested and available storage.	Check if the unbound PVC is in an acceptable condition. If it is, enable permissive mode. Be aware of the implications for backup behavior. When permissive mode is enabled, the PVC configuration is backed up, but there is no volume data to be backed up.
`Failed to query API resources ...`	Description: An API service in the cluster is misconfigured. This causes requests to the API path to return "Failed to query API resources." The underlying service may not exist or may not be ready yet. Reason: Backup for GKE is unable to back up any resources served by the unavailable API.	Check the underlying service in the API service's `spec.service` to make sure it is ready. When permissive mode is enabled, resources from the API groups that failed to load won't be backed up.
`Secret ... is an auto-generated token from ServiceAccount ... referenced in Pod specs`	Description: In Kubernetes v1.23 and earlier, service accounts automatically generate a token backed by a secret. However, in later versions, Kubernetes removed this auto-generated token feature. A Pod in the cluster might have mounted the secret volume to its containers' file system. Reason: If Backup for GKE attempts to restore a service account along with its auto-generated secret and a Pod that mounts the secret volume, the restore appears to be successful. However, Kubernetes removes the secret, which causes the Pod to get stuck in container creation and fail to start.	Define the `spec.serviceAccountName` field in the Pod. This action ensures that the token is automatically mounted on `/var/run/secrets/kubernetes.io/serviceaccount` in the containers. For more information, refer to Configure Service Accounts for Pods documentation. When permissive mode is enabled, the secret is backed up but can't be mounted in Pods in Kubernetes v1.24+ clusters.

Common CRDs with issues and recommended actions

Here are some common CRDs that have backup issues and the actions we recommend to address the issues:

capacityrequests.internal.autoscaling.k8s.io: This CRD was used temporarily in v1.21 clusters. Run kubectl delete crd capacityrequests.internal.autoscaling.k8s.io to remove the CRD.
scalingpolicies.scalingpolicy.kope.io: This CRD was used to control fluentd resources, but GKE has migrated to using fluentbit. Run kubectl delete crd scalingpolicies.scalingpolicy.kope.io to remove the CRD.
memberships.hub.gke.io: Run kubectl delete crd memberships.hub.gke.io to remove the CRD if there are no membership resources. Enable permissive mode if there are membership resources.
applications.app.k8s.io: Enable permissive mode with an understanding of restore behavior.