Use secondary boot disks to preload data or container images

Autopilot Standard

This page shows you how to improve workload startup latency by using secondary boot disks in Google Kubernetes Engine (GKE) to preload data or container images on new nodes. This enables workloads to achieve a fast cold start and to improve the overall utilization of provisioned resources.

Before reading this page, ensure that you're familiar with Google Cloud, Kubernetes, containers, YAML, containerd runtime, and the Google Cloud CLI.

Overview

Starting in GKE version 1.28.3-gke.1067000 in Standard clusters and in GKE version 1.30.1-gke.1329000 in Autopilot clusters, you can configure the node pool with secondary boot disks. You can tell GKE to provision the nodes and preload them with data, such as a machine learning (ML) model, or a container image. Using preloaded container images or data in a secondary disk has the following benefits for your workloads:

Reduced latency when pulling large container images, or downloading data
Faster autoscaling
Quicker recovery from disruptions like maintenance events and system errors

The following sections describe how to configure the secondary boot disk in GKE Autopilot and Standard clusters.

How secondary boot disks work

Your workload can start more quickly by using the preloaded container image or data on secondary boot disks. Secondary boot disks have the following characteristics:

Secondary boot disks are Persistent Disks which are backed by distributed block storage.
The Persistent Disk is instantiated from disk images that you create ahead of time.
For scalability reasons, each node gets its own Persistent Disk instance created from the disk image. These Persistent Disk instances are deleted when the node is deleted.
If the disk image is already in use in the zone, the creation time of all subsequent disks created from the same disk image will be lower.
The secondary boot disk type is the same as the node boot disk.
The size of the secondary boot disk is decided by disk image size.

Adding secondary boot disks to your node pools does not normally increase the node provisioning time. GKE provisions secondary boot disks from the disk image in parallel with the node provisioning process.

Best practice:

To support preloaded container images, GKE extends the containerd runtime with plugins that read the container images from secondary boot disks. Container images are reused by the base layers.

Preload large base layers into the secondary boot disk, while the small upper layers can be pulled from the container registry.

Before you begin

Before you start, make sure you have performed the following tasks:

Enable the Google Kubernetes Engine API.

Enable Google Kubernetes Engine API

If you want to use the Google Cloud CLI for this task, install and then initialize the gcloud CLI. If you previously installed the gcloud CLI, get the latest version by running gcloud components update.
Note: For existing gcloud CLI installations, make sure to set the compute/region and compute/zone properties. By setting default locations, you can avoid errors in gcloud CLI like the following: One of [--zone, --region] must be supplied: Please specify location.

Enable the Container File System API.

Enable Container File System API

Requirements

The following requirements apply to using secondary boot disk:

Your clusters are running GKE version 1.28.3-gke.1067000 in GKE Standard or version 1.30.1-gke.1329000 in GKE Autopilot.
When you modify the disk image, you must create a new node pool. Updating the disk image on existing nodes is not supported.
Configure Image streaming to use the secondary boot disk feature.
Use the Container-Optimized OS with a containerd node image. Autopilot nodes use this node image by default.
Prepare the disk image with data ready during build time or with preloaded container images. Ensure that your cluster has access to the disk image to load onto the nodes.

Best practice:
Automate the disk image in a CI/CD pipeline.

Limitations

Secondary boot disks have the following limitations:

You can't update secondary boot disks for existing nodes. To attach a new disk image, create a new node pool.
You can't use secondary boot disks to preload data in GKE Autopilot clusters.

Pricing

When you create node pools with secondary boot disks, GKE attaches a Persistent Disk to each node within the node pool. Persistent Disks are billed based on Compute Engine disk pricing.

Prepare the secondary boot disk image

To prepare the secondary boot disk image, choose either the Images tab for preloading container images or choose the Data tab for preloading data, then complete the following instructions:

Images

GKE provides a tool called gke-disk-image-builder to create a virtual machine (VM), pull the container images on a disk, and then create a disk image from that disk.

To create a disk image with multiple preloaded container images, complete the following steps:

Create a Cloud Storage bucket to store the execution logs of gke-disk-image-builder.
Create a disk image with gke-disk-image-builder.

go run ./cli \
    --project-name=PROJECT_ID \
    --image-name=DISK_IMAGE_NAME \
    --zone=LOCATION \
    --gcs-path=gs://LOG_BUCKET_NAME \
    --disk-size-gb=10 \
    --container-image=docker.io/library/python:latest \
    --container-image=docker.io/library/nginx:latest

Replace the following:

PROJECT_ID: the name of your Google Cloud project.
DISK_IMAGE_NAME: the name of the image of the disk. For example, nginx-python-image.
LOCATION: the cluster location.
LOG_BUCKET_NAME: the name of the Cloud Storage bucket to store the execution logs. For example,gke-secondary-disk-image-logs/.

When you create a disk image with gke-disk-image-builder, Google Cloud creates multiple resources to complete the process (for example, a VM instance, a temporary disk, and a persistent disk). After its execution, the image builder cleans up all the resources, except the disk image that you created.

Data

Create a custom disk image as the data source by completing the following steps:

Configure the secondary boot disk

You can configure the secondary boot disk in a GKE Autopilot or Standard cluster.

Best practices:

Use an Autopilot cluster for a fully managed Kubernetes experience. To choose the GKE mode of operation that's the best fit for your workloads, see Choose a GKE mode of operation.

Use GKE Autopilot

In this section, you create a disk image allowlist to allow the disk image in an existing GKE Autopilot cluster. Then, you modify the Pod node selector to use a secondary boot disk.

Allow the disk images in your project

In this section, you create a GCPResourceAllowlist to allow GKE to create nodes with secondary boot disks from the disk images in your Google Cloud project.

Save the following manifest as allowlist-disk.yaml:

apiVersion: "node.gke.io/v1"
kind: GCPResourceAllowlist
metadata:
  name: gke-secondary-boot-disk-allowlist
spec:
  allowedResourcePatterns:
  - "projects/PROJECT_ID/global/images/.*"

Replace the PROJECT_ID with your project ID to host the disk image.

Apply the manifest:
```
kubectl apply -f allowlist-disk.yaml
```
GKE creates nodes with secondary boot disks from all disk images in the project.

Update the Pod node selector to use a secondary boot disk

In this section, you modify the Pod specification so that GKE creates the nodes with the secondary boot disk.

Add a nodeSelector to your Pod template:
```
nodeSelector:
    cloud.google.com.node-restriction.kubernetes.io/gke-secondary-boot-disk-DISK_IMAGE_NAME: CONTAINER_IMAGE_CACHE.PROJECT_ID
```
Replace the following:
- DISK_IMAGE_NAME: the name of your disk image.
- PROJECT_ID: your project ID to host the disk image.
Use the kubectl apply command to apply the Kubernetes specification with the Pod template.
Confirm that the secondary boot disk cache is in use:
```
kubectl get events --all-namespaces
```
The output is similar to the following:
```
75s         Normal      SecondaryDiskCachin
node/gke-pd-cache-demo-default-pool-75e78709-zjfm   Image
gcr.io/k8s-staging-jobsejt/pytorch-mnist:latest is backed by secondary disk cache
```
Note: Be aware that Kubernetes might throttle events if the volume is high. The missing event is not a definite indication that the secondary boot disk was not used. The next step is more reliable way to verify that the secondary boot disk was used.
The more reliable way to confirm that the secondary boot disk cache is in use:

Query the log from the node you are interested in using this log name:
```
logName="projects/PROJECT_ID/logs/gcfs-snapshotter"
```
Replace PROJECT_ID with your Google Cloud project ID.

The log similar to: Image gcr.io/k8s-staging-jobsejt/pytorch-mnist:latest is backed by secondary boot disk caching by 100.0%... is an indication that the secondary boot disk cache was used.

Check the image pull latency:

kubectl describe pod POD_NAME

Replace POD_NAME with the name of the Pod.

The output is similar to following:

…
  Normal  Pulled     15m   kubelet            Successfully pulled image "docker.io/library/nginx:latest" in 0.879149587s
…

The expected image pull latency for the cached container image should be significantly reduced, regardless of image size.

Use GKE Standard

To create a GKE Standard cluster and a node pool, complete the following instructions, choosing either the Images or Data tab based on whether you want to preload container images or preload data onto the secondary boot disk:

Images

To configure a secondary boot disk, either use the Google Cloud CLI or Terraform:

gcloud

Create a GKE Standard cluster with image streaming enabled:
```
gcloud container clusters create CLUSTER_NAME \
    --location=LOCATION \
    --cluster-version=VERSION \
    --enable-image-streaming
```
Replace the following:
- CLUSTER_NAME: the name of your cluster.
- LOCATION: the cluster location.
- VERSION: the GKE version to use. The GKE version must be 1.28.3-gke.1067000 or later.
Create a node pool with a secondary boot disk in the same project:
```
gcloud container node-pools create NODE_POOL_NAME \
--cluster=CLUSTER_NAME \
--location LOCATION \
--enable-image-streaming \
--secondary-boot-disk=disk-image=global/images/DISK_IMAGE_NAME,mode=CONTAINER_IMAGE_CACHE
```
Replace the following:
- NODE_POOL_NAME: the name of the node pool.
- CLUSTER_NAME: the name of the existing cluster.
- LOCATION: the compute zone or zones separated by comma of the cluster.
- DISK_IMAGE_NAME: the name of your disk image.
To create a node pool with a secondary boot disk from the disk image in a different project, complete the steps in Use a secondary boot disk in a different project.

Add a nodeSelector to your Pod template:

nodeSelector:
    cloud.google.com/gke-nodepool: NODE_POOL_NAME

Confirm that the secondary boot disk cache is in use:
```
kubectl get events --all-namespaces
```
The output is similar to the following:
```
75s       Normal      SecondaryDiskCachin
node/gke-pd-cache-demo-default-pool-75e78709-zjfm Image
gcr.io/k8s-staging-jobsejt/pytorch-mnist:latest is backed by secondary disk cache
```
Note: Be aware that Kubernetes might throttle events if the volume is high. The missing event is not a definite indication that the secondary boot disk was not used. The next step is more reliable way to verify that the secondary boot disk was used.
The more reliable way to confirm that the secondary boot disk cache is in use:

Query the log from the node you are interested in using this log name:
```
logName="projects/PROJECT_ID/logs/gcfs-snapshotter"
```
Replace PROJECT_ID with your Google Cloud project ID.

The log similar to: Image gcr.io/k8s-staging-jobsejt/pytorch-mnist:latest is backed by secondary boot disk caching by 100.0%... is an indication that the secondary boot disk cache was used.

Check the image pull latency by running the following command:

kubectl describe pod POD_NAME

Replace POD_NAME with the name of the Pod.

The output is similar to following:

…
  Normal  Pulled     15m   kubelet            Successfully pulled image "docker.io/library/nginx:latest" in 0.879149587s
…

The expected image pull latency for the cached container image should be no more than a few seconds, regardless of image size.

Terraform

To create a cluster with the default node pool using Terraform, refer to the following example:

resource "google_container_cluster" "default" {
  name               = "default"
  location           = "us-central1-a"
  initial_node_count = 1
  # Set `min_master_version` because secondary_boot_disks require GKE 1.28.3-gke.106700 or later.
  min_master_version = "1.28"
  # Setting `deletion_protection` to `true` would prevent
  # accidental deletion of this instance using Terraform.
  deletion_protection = false
}

Create a node pool with a secondary boot disk in the same project:

resource "google_container_node_pool" "secondary-boot-disk-container" {
  name               = "secondary-boot-disk-container"
  location           = "us-central1-a"
  cluster            = google_container_cluster.default.name
  initial_node_count = 1

  node_config {
    machine_type = "e2-medium"
    image_type   = "COS_CONTAINERD"
    gcfs_config {
      enabled = true
    }
    secondary_boot_disks {
      disk_image = ""
      mode       = "CONTAINER_IMAGE_CACHE"
    }
  }
}

To learn more about using Terraform, see Terraform support for GKE.

Add a nodeSelector to your Pod template:

nodeSelector:
    cloud.google.com/gke-nodepool: NODE_POOL_NAME

Confirm that the secondary boot disk cache is in use:

kubectl get events --all-namespaces

The output is similar to the following:

75s       Normal      SecondaryDiskCachin
node/gke-pd-cache-demo-default-pool-75e78709-zjfm Image
gcr.io/k8s-staging-jobsejt/pytorch-mnist:latest is backed by secondary disk cache

Check the image pull latency by running the following command:

kubectl describe pod POD_NAME

Replace POD_NAME with the name of the Pod.

The output is similar to following:

…
  Normal  Pulled     15m   kubelet            Successfully pulled image "docker.io/library/nginx:latest" in 0.879149587s
…

The expected image pull latency for the cached container image should be no more than a few seconds, regardless of image size.

To learn more about using Terraform, see Terraform support for GKE.

Data

You can configure a secondary boot disk and preload data by using the Google Cloud CLI or Terraform:

gcloud

Create a GKE Standard cluster with image streaming enabled:
```
gcloud container clusters create CLUSTER_NAME \
    --location=LOCATION \
    --cluster-version=VERSION \
    --enable-image-streaming
```
Replace the following:
- CLUSTER_NAME: the name of your cluster.
- LOCATION: the cluster location.
- VERSION: the GKE version to use. The GKE version must be 1.28.3-gke.1067000 or later.
Create a node pool with a secondary boot disk by using the --secondary-boot-disk flag:
```
gcloud container node-pools create NODE_POOL_NAME \
--cluster=CLUSTER_NAME \
--location LOCATION \
--enable-image-streaming \
--secondary-boot-disk=disk-image=global/images/DISK_IMAGE_NAME
```
Replace the following:
- NODE_POOL_NAME: the name of the node pool.
- CLUSTER_NAME: the name of the existing cluster.
- LOCATION: the compute zone or zones separated by comma of the cluster.
- DISK_IMAGE_NAME: the name of your disk image.
To create a node pool with a secondary boot disk from the disk image in a different project, complete the steps in Use a secondary boot disk in a different project.

GKE creates a node pool where each node has a secondary disk with preloaded data. GKE attaches and mounts the secondary boot disk on the node.
To access the data, mount the secondary boot disk image in the Pod containers by using a hostPath volume mount. Set /usr/local/data_path_sbd to the path in your container where you want the data to reside:
```
apiVersion: v1
kind: Pod
metadata:
  name: pod-name
spec:
  containers:
  ...
  volumeMounts:
  - mountPath: /usr/local/data_path_sbd
    name: data-path-sbd
...
volumes:
  - name: data-path-sbd
    hostPath:
        path: /mnt/disks/gke-secondary-disks/gke-DISK_IMAGE_NAME-disk
```
Replace DISK_IMAGE_NAME with the name of your disk image.

Note: the secondary boot disk is shared by all containers on the same node. Any data written to the secondary boot disk by one container is visible to all other containers on the node that mount it. To prevent accidental changes, you can mount the volume in read-only mode.

Terraform

To create a cluster with the default node pool using Terraform, refer to the following example:

resource "google_container_cluster" "default" {
  name               = "default"
  location           = "us-central1-a"
  initial_node_count = 1
  # Set `min_master_version` because secondary_boot_disks require GKE 1.28.3-gke.106700 or later.
  min_master_version = "1.28"
  # Setting `deletion_protection` to `true` would prevent
  # accidental deletion of this instance using Terraform.
  deletion_protection = false
}

Create a node pool with a secondary boot disk in the same project:

resource "google_container_node_pool" "secondary-boot-disk-data" {
  name               = "secondary-boot-disk-data"
  location           = "us-central1-a"
  cluster            = google_container_cluster.default.name
  initial_node_count = 1

  node_config {
    machine_type = "e2-medium"
    image_type   = "COS_CONTAINERD"
    gcfs_config {
      enabled = true
    }
    secondary_boot_disks {
      disk_image = ""
    }
  }
}

To learn more about using Terraform, see Terraform support for GKE.

To access the data, mount the secondary boot disk image in the Pod containers by using a hostPath volume mount. Set /usr/local/data_path_sbd to the path in your container where you want the data to reside:

apiVersion: v1
kind: Pod
metadata:
  name: pod-name
spec:
  containers:
  ...
  volumeMounts:
  - mountPath: /usr/local/data_path_sbd
    name: data-path-sbd
...
volumes:
  - name: data-path-sbd
    hostPath:
        path: /mnt/disks/gke-secondary-disks/gke-DISK_IMAGE_NAME-disk

Replace the DISK_IMAGE_NAME with the name of your disk image.

Cluster autoscaling with secondary boot disks

To create a node pool and configure cluster autoscaling on a secondary boot disk, use Google Cloud CLI:

  gcloud container node-pools create NODE_POOL_NAME \
      --cluster=CLUSTER_NAME \
      --location LOCATION \
      --enable-image-streaming \
      --secondary-boot-disk=disk-image=global/images/DISK_IMAGE_NAME,mode=CONTAINER_IMAGE_CACHE \
      --enable-autoscaling \
      --num-nodes NUM_NODES \
      --min-nodes MIN_NODES \
      --max-nodes MAX_NODES

Replace the following:

NODE_POOL_NAME: the name of the node pool.
CLUSTER_NAME: the name of the existing cluster.
LOCATION: the compute zone or zones separated by comma of the cluster.
DISK_IMAGE_NAME: the name of your disk image.
MIN_NODES: the minimum number of nodes to automatically scale for the specified node pool per zone. To specify the minimum number of nodes for the entire node pool in GKE versions 1.24 and later, use --total-min-nodes. The flags --total-min-nodes and --total-max-nodes are mutually exclusive with the flags --min-nodes and --max-nodes.
MAX_NODES: the maximum number of nodes to automatically scale for the specified node pool per zone. To specify the maximum number of nodes for the entire node pool in GKE versions 1.24 and later, use --total-max-nodes. The flags --total-min-nodes and --total-max-nodes are mutually exclusive with the flags --min-nodes and --max-nodes.

Node auto-provisioning with secondary boot disks

In GKE 1.30.1-gke.1329000 and later, you can configure node auto-provisioning to automatically create and delete node pools to meet the resource demands of your workloads.

Create a disk image allowlist custom resource for secondary boot disk for GKE node auto-provisioning similar to the following:

apiVersion: "node.gke.io/v1"
kind: GCPResourceAllowlist
metadata:
  name: gke-secondary-boot-disk-allowlist
spec:
  allowedResourcePatterns:
  - "projects/<PROJECT_ID>/global/images/.*"

Replace the PROJECT_ID with your project ID to host the disk image.

Deploy the allowlist custom resource in the cluster, run the following command:
```
kubectl apply -f ALLOWLIST_FILE
```
Replace the ALLOWLIST_FILE with the manifest filename.
Update the Pod node selector to use secondary boot disk:
```
nodeSelector:
    cloud.google.com.node-restriction.kubernetes.io/gke-secondary-boot-disk-DISK_IMAGE_NAME:CONTAINER_IMAGE_CACHE.PROJECT_ID
```
Replace the following:
- DISK_IMAGE_NAME: the name of your disk image.
- PROJECT_ID: your project ID to host the disk image.

Use a secondary boot disk in a different project

When you create a node pool with a secondary boot disk, you can tell GKE to use the disk image in a different project by using the --secondary-boot-disk flag.

Create a node pool with a secondary boot disk from the disk image in a different project by using the --secondary-boot-disk flag. For example:
```
gcloud beta container node-pools create NODE_POOL_NAME \
    --cluster=CLUSTER_NAME \
    --location LOCATION \
    --enable-image-streaming \
    --secondary-boot-disk=disk-image=projects/IMAGE_PROJECT_ID/global/images/DISK_IMAGE_NAME,mode=CONTAINER_IMAGE_CACHE
```
Replace the following:
- DISK_IMAGE_NAME: the name of your disk image.
- IMAGE_PROJECT_ID: the name of the project that the disk image belongs to.
GKE creates a node pool where each node has a secondary disk with preloaded data. GKE attaches and mounts the secondary boot disk onto the node.

Grant access to disk images belonging to a different project by adding "Compute Image User" roles for the cluster service accounts:

Default compute service account: CLUSTER_PROJECT_NUMBER@cloudservices.gserviceaccount.com
GKE service account: service-CLUSTER_PROJECT_NUMBER@container-engine-robot.iam.gserviceaccount.com

gcloud projects add-iam-policy-binding IMAGE_PROJECT_ID \
    --member serviceAccount:CLUSTER_PROJECT_NUMBER@cloudservices.gserviceaccount.com \
    --role roles/compute.imageUser

gcloud projects add-iam-policy-binding IMAGE_PROJECT_ID \
    --member serviceAccount:service-CLUSTER_PROJECT_NUMBER@container-engine-robot.iam.gserviceaccount.com \
    --role roles/compute.imageUser

What's next

Use Use Image streaming to pull container images to pull container images by streaming the image data as your workloads need it.
See Improve workload efficiency using NCCL Fast Socket to learn how to use the NVIDIA Collective Communication Library (NCCL) Fast Socket plugin.