When you install a new version of bmctl
, you can upgrade your existing
clusters that were created with an earlier version. Upgrading a cluster to the
latest Google Distributed Cloud version brings added features and fixes to your
cluster. It also ensures that your cluster remains
supported.
You can upgrade admin, hybrid, standalone, or user clusters with the
bmctl upgrade cluster
command, or you can use kubectl
.
To learn more about the upgrade process, see Lifecycle and stages of cluster upgrades.
Plan your upgrade
This section contains information and links to information that you should consider before you upgrade a cluster.
Best practices
For information to help you prepare for a cluster upgrade, see Best practices for Anthos clusters on bare metal cluster upgrades.
Upgrade preflight checks
Preflight checks are run as part of the cluster upgrade to validate cluster status and node health. The cluster upgrade doesn't proceed if the preflight checks fail. For more information on preflight checks, see Understand preflight checks.
You can check if the clusters are ready for an upgrade by running the preflight check before running the upgrade. For more information, see Preflight checks for upgrades.
Known issues
For information about potential problems related to cluster upgrades, see Anthos clusters on bare metal known issues and select the Upgrades and updates problem category.
Configure upgrade options
Before you start a cluster upgrade, you can configure the following upgrade options that control how the upgrade process works:
Selective worker node pool upgrades: upgrade specific worker node pools separately from the rest of the cluster.
Parallel upgrades: configure the upgrade process to upgrade groups of nodes or node pools simultaneously.
These options can reduce the risk of disruptions to critical applications and services and significantly reduce overall upgrade time. These options are especially useful for large clusters with numerous nodes and node pools running important workloads. For more information about what these options do and how to use them, see the following sections.
Selective worker node pool upgrades
By default, the cluster upgrade operation upgrades every node and node pool in the cluster. A cluster upgrade can be disruptive and time consuming, as it results in each node being drained and all associated pods being restarted/rescheduled. This section describes how you can include or exclude select worker node pools for a cluster upgrade to minimize workload disruption. This feature applies to user, hybrid, and standalone clusters only, since admin clusters don't allow worker node pools.
You might use selective node pool upgrades in the following situations:
To pick up security fixes without disrupting workloads: You can upgrade just your control plane nodes (and load balancer nodes) to apply Kubernetes vulnerability fixes without disrupting your worker node pools.
To confirm proper operation of an upgraded subset of worker nodes before upgrading all worker nodes: You can upgrade your worker node pools selectively to ensure that workloads are running properly on an upgraded node pool before you upgrade another node pool.
To reduce the maintenance window: Upgrading a large cluster can be time consuming and it's difficult to accurately predict when an upgrade will complete. Cluster upgrade time is proportional to the number of nodes being upgraded. Reducing the number of nodes being upgraded by excluding node pools reduces the upgrade time. You upgrade multiple times, but the smaller, more predictable maintenance windows may help with scheduling.
For the versioning rules for selectively upgrading worker node pools, see Node pool versioning rules in Lifecycle and stages of cluster upgrades.
Upgrade your cluster control plane and selected node pools
To selectively upgrade worker node pools in the initial cluster upgrade:
For the worker node pools that you want to include in the cluster upgrade, make one of the following changes to the NodePool spec:
- Set
anthosBareMetalVersion
in the NodePool spec to the cluster target upgrade version. - Omit the
anthosBareMetalVersion
field from the NodePool spec. or set it to the empty string. By default, worker node pools are included in cluster upgrades.
- Set
For the worker node pools that you want to exclude from the upgrade, set
anthosBareMetalVersion
to the current (pre-upgrade) version of the cluster:Continue with your upgrade as described in Start the cluster upgrade.
The cluster upgrade operation upgrades the following nodes:
- Cluster control plane nodes.
- Load balancer node pool, if your cluster uses one
(
spec.loadBalancer.nodePoolSpec
). By default, load balancer nodes can run regular workloads. You can't selectively upgrade a load balancer node pool, it's always included in the initial cluster upgrade. - Worker node pools that you haven't excluded from the upgrade.
For example, suppose that your cluster is at version 1.15.0 and
has two worker node pools: wpool01
and wpool02
. Also, suppose that you want
to upgrade the control plane and wpool01
to 1.16.8, but you want
wpool02
to remain at version 1.15.0.
The following cluster configuration file excerpt shows how you can modify the cluster configuration to support this partial upgrade:
...
---
apiVersion: baremetal.cluster.gke.io/v1
kind: Cluster
metadata:
name: user001
namespace: cluster-user001
spec:
type: user
profile: default
anthosBareMetalVersion: 1.16.8
---
apiVersion: baremetal.cluster.gke.io/v1
kind: NodePool
metadata:
name: wpool01
namespace: cluster-user001
spec:
clusterName: user001
anthosBareMetalVersion: 1.16.8
nodes:
- address: 10.200.0.1
- address: 10.200.0.2
- address: 10.200.0.3
...
- address: 10.200.0.8
apiVersion: baremetal.cluster.gke.io/v1
kind: NodePool
metadata:
name: wpool02
namespace: cluster-user001
spec:
clusterName: user001
anthosBareMetalVersion: 1.15.0
nodes:
- address: 10.200.1.1
- address: 10.200.1.2
- address: 10.200.1.3
...
- address: 10.200.1.12
Upgrade node pools to the current cluster version
If you've excluded node pools from a cluster upgrade, you can run a cluster
upgrade that brings them up to the target cluster version. Worker node pools
that have been excluded from a cluster upgrade have the anthosBareMetalVersion
field in their NodePool spec set to the previous (pre-upgrade) cluster version.
To bring worker node pools up to the current, upgraded cluster version:
Edit the NodePool specs in the cluster configuration file for the worker node pools that you want to bring up to the current cluster version. Set
anthosBareMetalVersion
to the current (post-upgrade) cluster version.If multiple worker node pools are selected for upgrade, the value of
spec.nodePoolUpgradeStrategy.concurrentNodePools
in the cluster spec determines how many node pools are upgraded in parallel, if any. If you don't want to upgrade worker node pools concurrently, select one node pool at a time for upgrade.Continue with your upgrade as described in Start the cluster upgrade.
The cluster upgrade operation upgrades only the previously excluded worker node pools for which you have set
anthosBareMetalVersion
to the current, upgraded cluster version.
For example, suppose that you upgraded your cluster to version
1.16.8, but node pool wpool02
is still at the old, pre-upgrade
cluster version 1.15.0. Workloads are running properly on the
upgraded node pool, wpool01
, so now you want to bring wpool02
up to the
current cluster version, too. To upgrade wpool02
, you can remove the
anthosBareMetalVersion
field or set its value to the empty string.
The following cluster configuration file excerpt shows how you can modify the cluster configuration to support this partial upgrade:
...
---
apiVersion: baremetal.cluster.gke.io/v1
kind: Cluster
metadata:
name: user001
namespace: cluster-user001
spec:
type: user
profile: default
anthosBareMetalVersion: 1.16.8
---
apiVersion: baremetal.cluster.gke.io/v1
kind: NodePool
metadata:
name: wpool01
namespace: cluster-user001
spec:
clusterName: user001
anthosBareMetalVersion: 1.16.8
nodes:
- address: 10.200.0.1
- address: 10.200.0.2
- address: 10.200.0.3
...
- address: 10.200.0.8
apiVersion: baremetal.cluster.gke.io/v1
kind: NodePool
metadata:
name: wpool02
namespace: cluster-user001
spec:
clusterName: user001
anthosBareMetalVersion: ""
nodes:
- address: 10.200.1.1
- address: 10.200.1.2
- address: 10.200.1.3
...
- address: 10.200.1.12
Parallel upgrades
In a typical, default cluster upgrade, each cluster node is upgraded sequentially, one after the other. This section shows you how to configure your cluster and worker node pools so that multiple nodes upgrade in parallel when you upgrade your cluster. Upgrading nodes in parallel speeds up cluster upgrades significantly, especially for clusters that have hundreds of nodes.
There are two parallel upgrade strategies that you can use to speed up your cluster upgrade:
Concurrent node upgrade: you can configure your worker node pools so that multiple nodes upgrade in parallel. Parallel upgrades of nodes are configured in the NodePool spec (
spec.upgradeStrategy.parallelUpgrade
) and only nodes in a worker node pool can be upgraded in parallel. Nodes in control plane or load balancer node pools can only be upgraded one at a time. For more information, see Node upgrade strategy.Concurrent node pool upgrade: you can configure your cluster so that multiple node pools upgrade in parallel. Only worker node pools can be upgraded in parallel. Control plane and load balancer node pools can only be upgraded one at a time.
Node upgrade strategy
You can configure worker node pools so that multiple nodes upgrade concurrently
(concurrentNodes
). You can also set a minimum threshold for the number of
nodes able to run workloads throughout the upgrade process
(minimumAvailableNodes
). This configuration is made in the NodePool spec. For
more information about these fields, see the
Cluster configuration field reference.
The node upgrade strategy applies to worker node pools only. You can't specify a node upgrade strategy for control plane or load balancer node pools. During a cluster
upgrade, nodes in control plane and load balancer node pools upgrade
sequentially, one at a time. Control plane node pools and load balancer node
pools are specified in the Cluster spec (controlPlane.nodePoolSpec.nodes
and
loadBalancer.nodePoolSpec.nodes
).
When you configure parallel upgrades for nodes, note the following restrictions:
The value of
concurrentNodes
can't exceed either 50 percent of the number of nodes in the node pool, or the fixed number 15, whichever is smaller. For example, if your node pool has 20 nodes, you can't specify a value greater than 10. If your node pool has 100 nodes, 15 is the maximum value you can specify.When you use
concurrentNodes
together withminimumAvailableNodes
, the combined values can't exceed the total number of nodes in the node pool. For example, if your node pool has 20 nodes andminimumAvailableNodes
is set to 18,concurrentNodes
can't exceed 2. Likewise, ifconcurrentNodes
is set to 10,minimumAvailableNodes
can't exceed 10.
The following example shows a worker node pool np1
with 10 nodes. In an
upgrade, nodes upgrade 5 at a time and at least 4 nodes must remain
available for the upgrade to proceed:
apiVersion: baremetal.cluster.gke.io/v1
kind: NodePool
metadata:
name: np1
namespace: cluster-cluster1
spec:
clusterName: cluster1
nodes:
- address: 10.200.0.1
- address: 10.200.0.2
- address: 10.200.0.3
- address: 10.200.0.4
- address: 10.200.0.5
- address: 10.200.0.6
- address: 10.200.0.7
- address: 10.200.0.8
- address: 10.200.0.9
- address: 10.200.0.10
upgradeStrategy:
parallelUpgrade:
concurrentNodes: 5
minimumAvailableNodes: 4
Node pool upgrade strategy
You can configure a cluster so that multiple worker node pools upgrade in
parallel. The nodePoolUpgradeStrategy.concurrentNodePools
Boolean field in the
cluster spec specifies whether or not to upgrade all worker node pools for a
cluster concurrently. By default (1
), node pools upgrade
sequentially, one after the other. When you set concurrentNodePools
to 0
, every worker node pool in the cluster upgrades in parallel.
Control plane and load balancing node pools are not affected by this setting.
These node pools always upgrade sequentially, one at a time. Control plane node
pools and load balancer node pools are specified in the Cluster spec
(controlPlane.nodePoolSpec.nodes
and loadBalancer.nodePoolSpec.nodes
).
apiVersion: baremetal.cluster.gke.io/v1
kind: Cluster
metadata:
name: cluster1
namespace: cluster-cluster1
spec:
...
nodePoolUpgradeStrategy:
concurrentNodePools: 0
...
How to perform a parallel upgrade
This section describes how to configure a cluster and a worker node pool for parallel upgrades.
To perform a parallel upgrade of worker node pools and nodes in a worker node pool, do the following:
Add an
upgradeStrategy
section to the NodePool spec.You can apply this manifest separately or as part of the cluster configuration file when you perform a cluster update.
Here's an example:
--- apiVersion: baremetal.cluster.gke.io/v1 kind: NodePool metadata: name: np1 namespace: cluster-ci-bf8b9aa43c16c47 spec: clusterName: ci-bf8b9aa43c16c47 nodes: - address: 10.200.0.1 - address: 10.200.0.2 - address: 10.200.0.3 ... - address: 10.200.0.30 upgradeStrategy: parallelUpgrade: concurrentNodes: 5 minimumAvailableNodes: 10
In this example, the value of the field
concurrentNodes
is5
, which means that 5 nodes upgrade in parallel. TheminimumAvailableNodes
field is set to10
, which means that at least 10 nodes must remain available for workloads throughout the upgrade.Add an
nodePoolUpgradeStrategy
section to the Cluster spec in the cluster configuration file.--- apiVersion: v1 kind: Namespace metadata: name: cluster-user001 --- apiVersion: baremetal.cluster.gke.io/v1 kind: Cluster metadata: name: user001 namespace: cluster-user001 spec: type: user profile: default anthosBareMetalVersion: 1.16.8 ... nodePoolUpgradeStrategy: concurrentNodePools: 0 ...
In this example, the
concurrentNodePools
field is set to0
, which means that all worker node pools upgrade concurrently during the cluster upgrade. The upgrade strategy for the nodes in the node pools is defined in the NodePool specs.Upgrade the cluster as described in the preceding Upgrade admin, standalone, hybrid, or user clusters section.
Parallel upgrade default values
Parallel upgrades are disabled by default and the fields related to parallel upgrades are mutable. At any time, you can either remove the fields or set them to their default values to disable the feature before a subsequent upgrade.
The following table lists the parallel upgrade fields and their default values:
Field | Default value | Meaning |
---|---|---|
nodePoolUpgradeStrategy.concurrentNodePools (Cluster spec) |
1 |
Upgrade worker node pools sequentially, one after the other. |
upgradeStrategy.parallelUpgrade.concurrentNodes (NodePool spec) |
1 |
Upgrade nodes sequentially, one after the other. |
upgradeStrategy.parallelUpgrade.minimumAvailableNodes (NodePool spec) |
The default minimumAvailableNodes value depends on the value of concurrentNodes .
|
Upgrade stalls once minimumAvailableNodes is reached and only continues once the number of available nodes is greater than minimumAvailableNodes . |
Start the cluster upgrade
This section contains instructions for upgrading clusters.
bmctl
When you download and install a new version of bmctl
, you can upgrade your
admin, hybrid, standalone, and user clusters created with an earlier version.
For a given version of bmctl
, a cluster can be upgraded to the same version
only.
Download the latest
bmctl
as described in Google Distributed Cloud downloads.Update
anthosBareMetalVersion
in the cluster configuration file to the upgrade target version.The upgrade target version must match the version of the downloaded
bmctl
file. The following cluster configuration file snippet shows theanthosBareMetalVersion
field updated to the latest version:--- apiVersion: baremetal.cluster.gke.io/v1 kind: Cluster metadata: name: cluster1 namespace: cluster-cluster1 spec: type: admin # Anthos cluster version. anthosBareMetalVersion: 1.16.8
Use the
bmctl upgrade cluster
command to complete the upgrade:bmctl upgrade cluster -c CLUSTER_NAME --kubeconfig ADMIN_KUBECONFIG
Replace the following:
- CLUSTER_NAME: the name of the cluster to upgrade.
- ADMIN_KUBECONFIG: the path to the admin cluster kubeconfig file.
The cluster upgrade operation runs preflight checks to validate cluster status and node health. The cluster upgrade doesn't proceed if the preflight checks fail. For troubleshooting information, see Troubleshoot cluster install or upgrade issues.
When all of the cluster components have been successfully upgraded, the cluster upgrade operation performs cluster health checks. This last step verifies that the cluster is in good operating condition. If the cluster doesn't pass all health checks, they continue to run until they pass. When all health checks pass, the upgrade finishes successfully.
For more information about the sequence of events for cluster upgrades, see Lifecycle and stages of cluster upgrades.
kubectl
To upgrade a cluster with kubectl
, perform the following steps:
Edit the cluster configuration file to set
anthosBareMetalVersion
to the upgrade target version.To initiate the upgrade, run the following command:
kubectl apply -f CLUSTER_CONFIG_PATH
Replace
CLUSTER_CONFIG_PATH
with the path to the edited cluster configuration file.
As with the upgrade process with bmctl
, preflight checks are run as part of
the cluster upgrade to validate cluster status and node health. If the preflight
checks fail, the cluster upgrade is halted. To troubleshoot any failures,
examine the cluster and related logs, since no bootstrap cluster is created. For
more information, see
Troubleshoot cluster install or upgrade issues.
Although you don't need the latest version of bmctl
to upgrade cluters with
kubectl
, we recommend that you
download the latest bmctl
. You need bmctl
to
perform other tasks, such as health checks and backups, to ensure that your
cluster stays in good working order.