This page shows you how to consume reserved Compute Engine zonal resources in specific GKE workloads. These capacity reservations give you a high level of assurance that specific hardware is available for your workloads.
Ensure that you're already familiar with the concepts of Compute Engine reservations, like consumption types, share types, and provisioning types. For details, see Reservations of Compute Engine zonal resources.
This page is intended for the following people:
- Application operators who deploy workloads that should run as soon as possible, usually with specialized hardware like GPUs.
- Platform administrators who want to obtain a high level of assurance that workloads run on optimized hardware that meets both application and organizational requirements.
About reservation consumption in GKE
Compute Engine capacity reservations let you provision specific hardware configurations in Google Cloud zones, either immediately or at a specified future time. You can then consume this reserved capacity in GKE.
Depending on your GKE mode of operation, you can consume the following reservation types:
- Autopilot mode: specific reservations only.
- Standard mode: specific reservations or any matching reservation.
To enable consuming reservations to create your resources, you must specify a
reservation affinity, like any
or specific
.
Reservation consumption options in GKE
GKE lets you consume reservations directly in individual workloads by using Kubernetes nodeSelectors in your workload manifest or by creating Standard mode node pools that consume the reservation. This page describes the approach of directly selecting reservations in individual resources.
You can also configure GKE to consume reservations during scaling operations that create new nodes by using custom compute classes. Custom compute classes let platform administrators define a hierarchy of node configurations for GKE to prioritize during node scaling so that workloads run on your selected hardware.
You can specify reservations in your custom compute class configuration so that any GKE workload that uses that custom compute class indicates to GKE to consume the specified reservations for that compute class.
To learn more, in the "About custom compute classes" page, see Consume Compute Engine reservations.
Before you begin
Before you start, make sure you have performed the following tasks:
- Enable the Google Kubernetes Engine API. Enable Google Kubernetes Engine API
- If you want to use the Google Cloud CLI for this task,
install and then
initialize the
gcloud CLI. If you previously installed the gcloud CLI, get the latest
version by running
gcloud components update
.
Consume capacity reservations in Autopilot clusters
Autopilot clusters support consuming resources from Compute Engine capacity reservations in the same project or in a shared project. You must set the consumption type property of the target reservation to specific, and you must explicitly select that reservation in your manifest. If you don't explicitly specify a reservation, Autopilot clusters won't consume reservations. To learn more about reservation consumption types, see How reservations work.
These reservations qualify for Compute flexible
committed use discounts. You must use the
Accelerator
compute class or the Performance
compute class
to consume capacity reservations.
Before you begin, create an Autopilot cluster running the following versions:
- To consume reserved accelerators, such as GPUs: 1.28.6-gke.1095000 or later
- To run Pods on a specific machine series and with each Pod on its own node: 1.28.6-gke.1369000 and later or version 1.29.1-gke.1575000 and later.
Create capacity reservations for Autopilot
Autopilot Pods can consume reservations that have the specific consumption type property in the same project as the cluster or in a shared reservation from a different project. You can consume the reserved hardware by explicitly referencing that reservation in your manifest. You can consume reservations in Autopilot for the following types of hardware:
- Any of the following types of GPUs:
nvidia-h200-141gb
: NVIDIA H200 (141GB)nvidia-h100-mega-80gb
: NVIDIA H100 Mega (80GB)nvidia-h100-80gb
: NVIDIA H100 (80GB)nvidia-a100-80gb
: NVIDIA A100 (80GB)nvidia-tesla-a100
: NVIDIA A100 (40GB)nvidia-l4
: NVIDIA L4nvidia-tesla-t4
: NVIDIA T4
To create a capacity reservation, see the following resources. The reservation must meet the following requirements:
- The machine types, accelerator types, and accelerator quantities match what your workloads will consume.
The reservation uses the specific consumption type. For example, in the gcloud CLI, you must specify the
--require-specific-reservation
flag when you create the reservation.
Consume a specific reservation in the same project in Autopilot
This section shows you how to consume a specific capacity reservation that's in the same project as your cluster. You can use kubectl or Terraform.
kubectl
Save the following manifest as
specific-autopilot.yaml
. This manifest has node selectors that consume a specific reservation. You can use VM instances or accelerators.VM instances
apiVersion: v1 kind: Pod metadata: name: specific-same-project-pod spec: nodeSelector: cloud.google.com/compute-class: Performance cloud.google.com/machine-family: MACHINE_SERIES cloud.google.com/reservation-name: RESERVATION_NAME cloud.google.com/reservation-affinity: "specific" containers: - name: my-container image: "k8s.gcr.io/pause" resources: requests: cpu: 12 memory: "50Gi" ephemeral-storage: "200Gi"
Replace the following:
MACHINE_SERIES
: a machine series that contains the machine type of the VMs in your specific capacity reservation. For example, if your reservation is forc3-standard-4
machine types, specifyC3
in theMACHINE_SERIES
field.RESERVATION_NAME
: the name of the Compute Engine capacity reservation.
Accelerators
apiVersion: v1 kind: Pod metadata: name: specific-same-project-pod spec: nodeSelector: cloud.google.com/gke-accelerator: ACCELERATOR cloud.google.com/reservation-name: RESERVATION_NAME cloud.google.com/reservation-affinity: "specific" containers: - name: my-container image: "k8s.gcr.io/pause" resources: requests: cpu: 12 memory: "50Gi" ephemeral-storage: "200Gi" limits: nvidia.com/gpu: QUANTITY
Replace the following:
ACCELERATOR
: the accelerator that you reserved in the Compute Engine capacity reservation. Must be one of the following values:nvidia-h200-141gb
: NVIDIA H200 (141GB)nvidia-h100-mega-80gb
: NVIDIA H100 Mega (80GB)nvidia-h100-80gb
: NVIDIA H100 (80GB)nvidia-a100-80gb
: NVIDIA A100 (80GB)nvidia-tesla-a100
: NVIDIA A100 (40GB)nvidia-l4
: NVIDIA L4nvidia-tesla-t4
: NVIDIA T4
RESERVATION_NAME
: the name of the Compute Engine capacity reservation.QUANTITY
: the number of GPUs to attach to the container. Must be a supported quantity for the specified GPU, as described in Supported GPU quantities.
Deploy the Pod:
kubectl apply -f specific-autopilot.yaml
Autopilot uses the reserved capacity in the specified reservation to provision a new node to place the Pod.
Terraform
To consume a specific reservation in the same project with VM instances using Terraform, refer to the following example:
To consume a specific reservation in the same project with the Accelerator compute class using Terraform, refer to the following example:
To learn more about using Terraform, see Terraform support for GKE.
Consume a specific shared reservation in Autopilot
This section uses the following terms:
- Owner project: the project that owns the reservation and shares it with other projects.
- Consumer project: the project that runs the workloads that consume the shared reservation.
To consume a shared reservation, you must grant the GKE service agent access to the reservation in the project that owns the reservation. Do the following:
Create a custom IAM role that contains the
compute.reservations.list
permission in the owner project:gcloud iam roles create ROLE_NAME \ --project=OWNER_PROJECT_ID \ --permissions='compute.reservations.list'
Replace the following:
ROLE_NAME
: a name for your new role.OWNER_PROJECT_ID
: the project ID of the project that owns the capacity reservation.
Give the GKE service agent in the consumer project access to list shared reservations in the owner project:
gcloud projects add-iam-policy-binding OWNER_PROJECT_ID \ --project=OWNER_PROJECT_ID \ --member=serviceAccount:service-CONSUMER_PROJECT_NUMBER@container-engine-robot.iam.gserviceaccount.com \ --role='projects/OWNER_PROJECT_ID/roles/ROLE_NAME'
Replace
CONSUMER_PROJECT_NUMBER
with the numerical project number of your consumer project. To find this number, see Identifying projects in the Resource Manager documentation.Save the following manifest as
shared-autopilot.yaml
. This manifest has nodeSelectors that tell GKE to consume a specific shared reservation.VM instances
apiVersion: v1 kind: Pod metadata: name: performance-pod spec: nodeSelector: cloud.google.com/compute-class: Performance cloud.google.com/machine-family: MACHINE_SERIES cloud.google.com/reservation-name: RESERVATION_NAME cloud.google.com/reservation-project: OWNER_PROJECT_ID cloud.google.com/reservation-affinity: "specific" containers: - name: my-container image: "k8s.gcr.io/pause" resources: requests: cpu: 12 memory: "50Gi" ephemeral-storage: "200Gi"
Replace the following:
MACHINE_SERIES
: a machine series that contains the machine type of the VMs in your specific capacity reservation. For example, if your reservation is forc3-standard-4
machine types, specifyC3
in theMACHINE_SERIES
field.RESERVATION_NAME
: the name of the Compute Engine capacity reservation.OWNER_PROJECT_ID
: the project ID of the project that owns the capacity reservation.
Accelerators
apiVersion: v1 kind: Pod metadata: name: specific-same-project-pod spec: nodeSelector: cloud.google.com/gke-accelerator: ACCELERATOR cloud.google.com/reservation-name: RESERVATION_NAME cloud.google.com/reservation-project: OWNER_PROJECT_ID cloud.google.com/reservation-affinity: "specific" containers: - name: my-container image: "k8s.gcr.io/pause" resources: requests: cpu: 12 memory: "50Gi" ephemeral-storage: "200Gi" limits: nvidia.com/gpu: QUANTITY
Replace the following:
ACCELERATOR
: the accelerator that you reserved in the Compute Engine capacity reservation. Must be one of the following values:nvidia-h200-141gb
: NVIDIA H200 (141GB)nvidia-h100-mega-80gb
: NVIDIA H100 Mega (80GB)nvidia-h100-80gb
: NVIDIA H100 (80GB)nvidia-a100-80gb
: NVIDIA A100 (80GB)nvidia-tesla-a100
: NVIDIA A100 (40GB)nvidia-l4
: NVIDIA L4nvidia-tesla-t4
: NVIDIA T4
RESERVATION_NAME
: the name of the Compute Engine capacity reservation.OWNER_PROJECT_ID
: the project ID of the project that owns the capacity reservation.QUANTITY
: the number of GPUs to attach to the container. Must be a supported quantity for the specified GPU, as described in Supported GPU quantities.
Deploy the Pod:
kubectl apply -f shared-autopilot.yaml
Autopilot uses the reserved capacity in the specified reservation to provision a new node to place the Pod.
Troubleshooting consuming reservations in Autopilot
- Ensure that the machine types, accelerator types, local SSD configurations, and accelerator quantities match what your workloads will consume. For a complete list of properties which must match, see Compute Engine capacity reservation properties.
- Ensure that the reservation is created with specific affinity.
- When using shared reservations, ensure that GKE service agent in the consumer project has permission to list shared reservations in the owner project.
Consuming reserved instances in GKE Standard
When you create a cluster or node pool, you can indicate the reservation
consumption mode by specifying the --reservation-affinity
flag.
Consuming any matching reservations
You can create a reservation and instances to consume any reservation using the gcloud CLI or Terraform.
gcloud
To consume from any matching reservations automatically, set the reservation
affinity flag to --reservation-affinity=any
. Since any
is the default
value defined in Compute Engine, you can omit the reservation affinity
flag entirely.
In the any
reservation consumption mode, nodes first take capacity from
all single-project reservations before any shared reservations, because the
shared reservations are more available to other projects. For more information
about how instances are automatically consumed see
Consumption order.
Create a reservation of three VM instances:
gcloud compute reservations create RESERVATION_NAME \ --machine-type=MACHINE_TYPE --vm-count=3
Replace the following:
RESERVATION_NAME
: the name of the reservation to create.MACHINE_TYPE
: the type of machine (name only) to use for the reservation. For example,n1-standard-2
.
Verify the reservation was created successfully:
gcloud compute reservations describe RESERVATION_NAME
Create a cluster having one node to consume any matching reservation:
gcloud container clusters create CLUSTER_NAME \ --machine-type=MACHINE_TYPE --num-nodes=1 \ --reservation-affinity=any
Replace
CLUSTER_NAME
with the name of the cluster to create.Create a node pool with three nodes to consume any matching reservation:
gcloud container node-pools create NODEPOOL_NAME \ --cluster CLUSTER_NAME --num-nodes=3 \ --machine-type=MACHINE_TYPE --reservation-affinity=any
Replace
NODEPOOL_NAME
with the name of the node pool to create.
The total number of nodes is four, which exceeds the capacity of the reservation. Three of the nodes consume the reservation while the last node takes capacity from the general Compute Engine resource pool.
Terraform
To create a reservation of three VM instances using Terraform, refer to the following example:
To create a cluster having one node to consume any matching reservation using Terraform, refer to the following example:
To create a node pool with three nodes to consume any matching reservation using Terraform, refer to the following example:
To learn more about using Terraform, see Terraform support for GKE.
Consuming a specific single-project reservation
To consume a specific reservation, set the reservation affinity flag to
--reservation-affinity=specific
and provide the specific reservation name. In
this mode, instances must take capacity from the specified reservation in the
zone. The request fails if the reservation does not have sufficient capacity.
To create a reservation and instances to consume a specific reservation, perform the following steps. You can use the gcloud CLI or Terraform.
gcloud
Create a specific reservation for three VM instances:
gcloud compute reservations create RESERVATION_NAME \ --machine-type=MACHINE_TYPE --vm-count=3 \ --require-specific-reservation
Replace the following:
RESERVATION_NAME
: the name of the reservation to create.MACHINE_TYPE
: the type of machine (name only) to use for the reservation. For example,n1-standard-2
.
Create a node pool with a single node to consume a specific single-project reservation:
gcloud container node-pools create NODEPOOL_NAME \ --cluster CLUSTER_NAME \ --machine-type=MACHINE_TYPE --num-nodes=1 \ --reservation-affinity=specific --reservation=RESERVATION_NAME
Replace the following:
NODEPOOL_NAME
: the name of the node pool to create.CLUSTER_NAME
: the name of the cluster that you created.
Terraform
To create a specific reservation using Terraform, refer to the following example:
To create a node pool with a single node to consume a specific single-project reservation using Terraform, refer to the following example:
To learn more about using Terraform, see Terraform support for GKE.
Consuming a specific shared reservation
To create a specific shared reservation and consume the shared reservation, perform the following steps. You can use the gcloud CLI or Terraform.
- Follow the steps in Allowing and restricting projects from creating and modifying shared reservations.
gcloud
Create a specific shared reservation:
gcloud compute reservations create RESERVATION_NAME \ --machine-type=MACHINE_TYPE --vm-count=3 \ --zone=ZONE \ --require-specific-reservation \ --project=OWNER_PROJECT_ID \ --share-setting=projects \ --share-with=CONSUMER_PROJECT_IDS
Replace the following:
RESERVATION_NAME
: the name of reservation to create.MACHINE_TYPE
: the name of the type of machine to use for the reservation. For example,n1-standard-2
.OWNER_PROJECT_ID
: the project ID of the project that you want to create this shared reservation. If you omit the--project
flag, GKE uses the current project as the owner project by default.CONSUMER_PROJECT_IDS
: a comma-separated list of the project IDs of projects that you want to share this reservation with. For example,project-1,project-2
. You can include 1 to 100 consumer projects. These projects must be in the same organization as the owner project. Don't include theOWNER_PROJECT_ID
, because it can consume this reservation by default.
Consume the shared reservation:
gcloud container node-pools create NODEPOOL_NAME \ --cluster CLUSTER_NAME \ --machine-type=MACHINE_TYPE --num-nodes=1 \ --reservation-affinity=specific \ --reservation=projects/OWNER_PROJECT_ID/reservations/RESERVATION_NAME
Replace the following:
NODEPOOL_NAME
: the name of the node pool to create.CLUSTER_NAME
: the name of the cluster that you created.
Terraform
To create a specific shared reservation using Terraform, refer to the following example:
To consume the specific shared reservation using Terraform, refer to the following example:
To learn more about using Terraform, see Terraform support for GKE.
Additional considerations for consuming from a specific reservation
When a node pool is created with specific reservation affinity, including default node pools during cluster creation, its size is limited to the capacity of the specific reservation over the node pool's entire lifetime. This affects the following GKE features:
- Cluster with multiple zones: In regional or multi-zonal clusters, nodes of a node pool can span across multiple zones. Since reservations are single-zonal, multiple reservations are needed. To create a node pool consuming specific reservation in these clusters, you must create a specific reservation with exactly the same name and machine properties in each zone of the node pool.
- Cluster autoscaling and node pool upgrades: If you don't have extra capacity in the specific reservation, node pool upgrades or autoscaling of the node pool might fail because both operations require creating extra instances. To resolve this, you can change the size of the reservation, or free up some of its bounded resources.
Creating nodes without consuming reservations
To explicitly avoid consuming resources from any reservations, set the affinity
to --reservation-affinity=none
.
Create a cluster that won't consume any reservation:
gcloud container clusters create CLUSTER_NAME --reservation-affinity=none
Replace
CLUSTER_NAME
with the name of the cluster to create.Create a node pool that won't consume any reservation:
gcloud container node-pools create NODEPOOL_NAME \ --cluster CLUSTER_NAME \ --reservation-affinity=none
Replace
NODEPOOL_NAME
with the name of the node pool to create.
Following available reservations between zones
When using node pools running in multiple zones with reservations that are not
equal between zones, you can use the flag --location_policy=ANY
. This ensures
that when new nodes are added to the cluster they are created in the zone that
still has unused reservations.
TPU reservation
TPU reservations differ from other machine types. The following are TPU-specific aspects you should consider when creating TPU reservations:
- When using TPUs in GKE,
SPECIFIC
is the only supported value for the--reservation-affinity
flag ofgcloud container node-pools create
.
For more information, see TPU reservations.
Cleaning up
To avoid incurring charges to your Cloud Billing account for the resources used in this page:
Delete the clusters you created by running the following command for each of the clusters:
gcloud container clusters delete CLUSTER_NAME
Delete the reservations you created by running the following command for each of the reservations:
gcloud compute reservations delete RESERVATION_NAME
What's next
- Learn more about reserving Compute Engine zonal resources.
- Learn more about node pools.
- Learn more about cluster autoscaler.
- Learn more about node upgrade strategies.