Managing K3S version upgrades

Kubernetes underlying system upgrades tend to be one of the more difficult components of managing a cluster. Fortunately, Rancher makes this really easy to manage with K3S. K3S provides an upgrade controller which will handle the upgrades automagically!

First thing’s first, we need to install the controller and it’s associated CRDs.

kubectl create ns system-upgrade
kubectl apply -f https://github.com/rancher/system-upgrade-controller/releases/latest/download/system-upgrade-controller.yaml

kubectl apply -f https://github.com/rancher/system-upgrade-controller/releases/latest/download/crd.yaml

Now the cluster controller is in place!

Next up is your plan. You should only ever jump one minor version at a time. Kubernetes has backwards compatibility, but it’s only for a single minor version, so you shouldn’t go from, say, 1.25 to 1.29 in a single jump. That said, Implementing this is easy!

Here’s an example of my plan for upgrading from 1.28 to 1.29:

apiVersion: upgrade.cattle.io/v1
kind: Plan
metadata:
  name: server-plan
  namespace: system-upgrade
spec:
  concurrency: 1
  cordon: true
  nodeSelector:
    matchExpressions:
    - key: node-role.kubernetes.io/control-plane
      operator: In
      values:
      - "true"
  serviceAccountName: system-upgrade
  upgrade:
    image: rancher/k3s-upgrade
  version: v1.29.4+k3s1
---
apiVersion: upgrade.cattle.io/v1
kind: Plan
metadata:
  name: agent-plan
  namespace: system-upgrade
spec:
  concurrency: 1
  cordon: true
  nodeSelector:
    matchExpressions:
    - key: node-role.kubernetes.io/control-plane
      operator: DoesNotExist
  prepare:
    args:
    - prepare
    - server-plan
    image: rancher/k3s-upgrade
  serviceAccountName: system-upgrade
  upgrade:
    image: rancher/k3s-upgrade
  version: v1.29.4+k3s1

Looks simple enough, right? Just make sure to specify spec.version correctly, by going to Releases · k3s-io/k3s · GitHub and checking the release tag!

If you have a large number of nodes (>20), you can adjust the concurrency using spec.concurrency to a higher value, so this upgrade doesn’t take all day. I’ve found that it takes approximately 5 minutes per node. Just be careful not to take down too many nodes at once!

Once you’ve applied the above manifest, you’re good to go, the controller will handle the rest. There might be a short interruption of your ability to query the kubernetes API during the server upgrade, but you shouldn’t experience any interruption in services running on the cluster itself.

2 Likes

Solid. So do we generally want to keep versions in lock step?

You want your versions to match, yeah. The API requires compatibility. They support one version prior for the purposes of non-interrupting upgrades, but that’s about it.

You can run a version or two behind latest if you’re worried about bugs, but I’ve had no issues going from 1.26 to 1.29 with k3s, and 1.23 to 1.29 with EKS.

Generally speaking, I’ll just go in there and do upgrades once every 6 months. I wouldn’t call running a slightly out of date control plane a massive security risk, but that’s your own choice to make. I suppose you could probably follow the RSS for the k3s release train.

I seek stability so updating once every now ans then will be my deal

I wish I could construct a tool that would aid in that. As in plan upgrades or dry run to see if it will break stuff???