Google Distributed Cloud (GDC) air-gapped lets you create backups and restore data from the home directory of your JupyterLab instances.
This page describes creating and restoring backups of Vertex AI Workbench notebook data. If you are new to Vertex AI, learn more about Vertex AI Workbench.
Create a protected application
Define protected applications to create a backup of the home directory of an individual JupyterLab instance or the home directories of all JupyterLab instances in a project at once.
Create a ProtectedApplication
custom resource in the cluster where you want to
schedule backups. Backup and restore plans use protected applications to select
resources. For information about creating protected applications, see
Protected application strategies.
The ProtectedApplication
custom resource contains the following fields:
Field | Description | |||
---|---|---|---|---|
resourceSelection |
The way in which the ProtectedApplication object selects resources for backups or restorations. |
|||
type |
The method to select resources. A Selector type indicates that resources with matching labels must be selected. |
|||
selector |
The selection rules. This field contains the following sub-fields: | |||
matchLabels |
The labels that the ProtectedApplication object uses to match resources. This field contains the following sub-fields: |
|||
app.kubernetes.io/part-of |
The name of a higher level application this one is part of. Select Vertex AI Workbench as the high-level application for JupyterLab instances. | |||
app.kubernetes.io/component |
The component within the architecture. Select resources from Vertex AI Workbench that provide storage for JupyterLab instances. | |||
app.kubernetes.io/instance |
A unique name identifying the instance of an application. Narrow the scope to select a JupyterLab instance. The value is the same as the name of the JupyterLab instance on the GDC console. |
Select the storage of a single JupyterLab instance:
The following example shows a
ProtectedApplication
custom resource that selects the storage for a JupyterLab instance namedmy-instance-name
in themy-project
namespace:apiVersion: gkebackup.gke.io/v1 kind: ProtectedApplication metadata: name: my-protected-application namespace: my-project spec: resourceSelection: type: Selector selector: matchLabels: app.kubernetes.io/part-of: vtxwb app.kubernetes.io/component: storage app.kubernetes.io/instance: my-instance-name
Select the storage of all JupyterLab instances:
The following example shows a
ProtectedApplication
custom resource that selects the storage for all JupyterLab instances in themy-project
namespace:apiVersion: gkebackup.gke.io/v1 kind: ProtectedApplication metadata: name: my-protected-application namespace: my-project spec: resourceSelection: type: Selector selector: matchLabels: app.kubernetes.io/part-of: vtxwb app.kubernetes.io/component: storage
This example doesn't contain the
app.kubernetes.io/instance
label because it selects all JupyterLab instances.
Create a backup and restore JupyterLab instance data
To create a backup and restore data from a JupyterLab instance, plan a set of backups and plan a set of restores using the ProtectedApplication
custom resource you defined.
Copy restored data to a new JupyterLab instance
Follow these steps to copy restored data from the PersistentVolumeClaim
resource of a JupyterLab instance to a new JupyterLab instance:
- To get the permissions that you need to copy restored data, ask your
Organization IAM Admin to grant you the User Cluster Developer
(
user-cluster-developer
) role. - Create a JupyterLab notebook associated with a JupyterLab instance to copy restored data.
Get the pod name of the JupyterLab instance where you created the notebook:
kubectl get pods -l notebook-name=INSTANCE_NAME -n PROJECT_NAMESPACE
Replace the following:
INSTANCE_NAME
: the name of the JupyterLab instance you configured.PROJECT_NAMESPACE
: the project namespace where you created the JupyterLab instance.
Get the name of the image that the JupyterLab instance is running:
kubectl get pods POD_NAME -n PROJECT_NAMESPACE -o jsonpath="{.spec.containers[0].image}"
Replace the following:
POD_NAME
: the pod name of the JupyterLab instance.PROJECT_NAMESPACE
: the project namespace where you created the JupyterLab instance.
Find the name of the
PersistentVolumeClaim
resource that was restored:kubectl get pvc -l app.kubernetes.io/part-of=vtxwb,app.kubernetes.io/component=storage,app.kubernetes.io/instance=RESTORED_INSTANCE_NAME -n PROJECT_NAMESPACE
Replace the following:
RESTORED_INSTANCE_NAME
: the name of the JupyterLab instance that you restored.PROJECT_NAMESPACE
: the project namespace where you created the JupyterLab instance.
Create a YAML file named
vtxwb-data.yaml
with the following content:apiVersion: v1 kind: Pod metadata: name: vtxwb-data namespace: PROJECT_NAMESPACE labels: aiplatform.gdc.goog/service-type: workbench spec: containers: - args: - sleep infinity command: - bash - -c image: IMAGE_NAME imagePullPolicy: IfNotPresent name: vtxwb-data resources: limits: cpu: "1" memory: 1Gi requests: cpu: "1" memory: 1Gi terminationMessagePath: /dev/termination-log terminationMessagePolicy: File volumeMounts: - mountPath: /home/jovyan name: restore-data workingDir: /home/jovyan volumes: - name: restore-data persistentVolumeClaim: claimName: RESTORED_PVC_NAME
Replace the following:
PROJECT_NAMESPACE
: the project namespace where you created the JupyterLab instance.IMAGE_NAME
: the name of the container image that the JupyterLab instance is running.RESTORED_PVC_NAME
: the name of the restoredPersistentVolumeClaim
resource.
Create a new pod for your restored
PersistentVolumeClaim
resource:kubectl apply -f ./vtxwb-data --kubeconfig KUBECONFIG_PATH
Replace
KUBECONFIG_PATH
with the path of the kubeconfig file in the cluster.Wait for the
vtxwb-data
pod to reach theRUNNING
state.Copy your restored data to a new JupyterLab instance:
kubectl cp PROJECT_NAMESPACE/vtxwb-data:/home/jovyan ./restore --kubeconfig KUBECONFIG_PATH kubectl cp ./restore PROJECT_NAMESPACE/POD_NAME:/home/jovyan/restore --kubeconfig KUBECONFIG_PATH rm ./restore
Replace the following:
PROJECT_NAMESPACE
: the project namespace where you created the JupyterLab instance.KUBECONFIG_PATH
: the path of the kubeconfig file in the cluster.POD_NAME
: the pod name of the JupyterLab instance.
After copying the data, your restored data is available in the
/home/jovyan/restore
directory.Delete the pod that you created to access your restored data:
kubectl delete pod vtxwb-data -n my-namespace` --kubeconfig KUBECONFIG_PATH
Replace
KUBECONFIG_PATH
with the path of the kubeconfig file in the cluster.