Kubernetes underlying system upgrades tend to be one of the more difficult components of managing a cluster. Fortunately, Rancher makes this really easy to manage with K3S. K3S provides an upgrade controller which will handle the upgrades automagically!
First thing’s first, we need to install the controller and it’s associated CRDs.
kubectl create ns system-upgrade
kubectl apply -f https://github.com/rancher/system-upgrade-controller/releases/latest/download/system-upgrade-controller.yaml
kubectl apply -f https://github.com/rancher/system-upgrade-controller/releases/latest/download/crd.yaml
Now the cluster controller is in place!
Next up is your plan. You should only ever jump one minor version at a time. Kubernetes has backwards compatibility, but it’s only for a single minor version, so you shouldn’t go from, say, 1.25 to 1.29 in a single jump. That said, Implementing this is easy!
Here’s an example of my plan for upgrading from 1.28 to 1.29:
apiVersion: upgrade.cattle.io/v1
kind: Plan
metadata:
name: server-plan
namespace: system-upgrade
spec:
concurrency: 1
cordon: true
nodeSelector:
matchExpressions:
- key: node-role.kubernetes.io/control-plane
operator: In
values:
- "true"
serviceAccountName: system-upgrade
upgrade:
image: rancher/k3s-upgrade
version: v1.29.4+k3s1
---
apiVersion: upgrade.cattle.io/v1
kind: Plan
metadata:
name: agent-plan
namespace: system-upgrade
spec:
concurrency: 1
cordon: true
nodeSelector:
matchExpressions:
- key: node-role.kubernetes.io/control-plane
operator: DoesNotExist
prepare:
args:
- prepare
- server-plan
image: rancher/k3s-upgrade
serviceAccountName: system-upgrade
upgrade:
image: rancher/k3s-upgrade
version: v1.29.4+k3s1
Looks simple enough, right? Just make sure to specify spec.version correctly, by going to Releases · k3s-io/k3s · GitHub and checking the release tag!
If you have a large number of nodes (>20), you can adjust the concurrency using spec.concurrency to a higher value, so this upgrade doesn’t take all day. I’ve found that it takes approximately 5 minutes per node. Just be careful not to take down too many nodes at once!
Once you’ve applied the above manifest, you’re good to go, the controller will handle the rest. There might be a short interruption of your ability to query the kubernetes API during the server upgrade, but you shouldn’t experience any interruption in services running on the cluster itself.