Kubernetes Homelab: Minecraft Server

SgtAwesomesauce · March 29, 2024, 11:30pm

I set out to run my entire homelab (outside mass file storage) in Kubernetes last summer. I’ve finally found some time to actually write about it. This article in particular comes by request of @PhaseLockedLoop, who wanted to replicate my minecraft config. There’s no highly-available servers or anything super special here, so if you’re hoping for that, you’ll have to look elsewhere. What we do have is fault-tolerant storage, a single MetalLB IP to connect and prometheus metrics.

Minecraft servers are incredibly simple to run. Install java, download the server jar, java -jar server.jar nogui. But that’s not good enough. There’s config options, tooling, monitoring and mods. All. The. Mods.

Architecture

Kubernetes (K8S) is a bit more complex than simple docker containers, given it’s internal networking, DNS and other wonderful tooling that makes it a wonderful tool to work with at scale. This does, however, provide a bit of a barrier to entry for the newbie. Let’s try to demystify this a bit, shall we?

What we’ll need

We’re going to need a few resources to get this up and running. We’re going to need storage, compute and networking.

Storage

Storage is simple: PVCs, or Persistent Volume Claims are K8S resources that provide a virtual block device to a container. This storage is persistent and can be allocated from a number of Storage Classes, which are just ways to delineate the different types of storage that are available to the cluster. I’m using Longhorn and NFS for my storage classes. For this server, we’ll choose Longhorn.

We’ll need two PVCs. One for storing the actual server data, another one to serve as a persistent storage for any modpack archive files you need to install.

apiVersion: v1
kind: PersistentVolumeClaim
metadata:
  name: minecraft-data
  namespace: minecraft
spec:
  storageClassName: longhorn
  accessModes:
    - ReadWriteMany
  resources:
    requests:
      storage: 250Gi

apiVersion: v1
kind: PersistentVolumeClaim
metadata:
  name: modpacks
  namespace: minecraft
spec:
  storageClassName: longhorn
  accessModes:
    - ReadWriteMany
  resources:
    requests:
      storage: 25Gi

A few important bits in here:

metadata.name This is the name of the resource and how you’ll reference it in your pod configuration templates.

metadata.namespace All your resources for an application should reside in the same namespace. Namespaces keep different projects segregated, and help prevent cross-contamination of resources.

spec.storageClassName I’ve named my storage class Longhorn. Keep in mind that Minecraft likes to have a lot of iops, and networked storage like NFS is less capable than local storage.

spec.accessModes[] Here you choose how you want your cluster to allow access to the storage device. You can choose to allow multiple pods to access the storage, or you can choose to only limit it to one. The common modes that are used here are ReadWriteOnce for single pod access and ReadWriteMany to allow more than one pod to access the PVC at once.

spec.resources.requests.storage This is how you define your disk allocation. most storage controllers in Kubernetes are thin provisioners, so the disk space is only consumed as data is written.

Compute

Compute is a bit more complex. Kubernetes compute relies on a number of templates and abstractions to make managing large numbers of containers easier. You have a Container, which is contained in a pod. Multiple containers can be in a single pod, and those containers can (but do not have to) share disk resources and other configuration flags and are tightly tied together. They also respond to the same DNS address and share a cluster-internal IP. Pods are the lowest level of compute resource that can be spun up manually. However, pods are almost always created by an abstraction layer above them, which allows you to manage a number of configuration and labeling flags.

There are three main types of compute resources, which all serve different use cases. For our Minecraft server, we’re only going to focus on the StatefulSet resource.

Our StatefulSet will be a bit involved, so let’s cover the broad strokes, conceptually here. A StatefulSet is designed for applications that hold stateful data which must persist across pod destructions or restarts. StatefulSets can autoscale and move from server to server seamlessly if configured correctly. We’re not going to worry about autoscaling in this situation though, because we will only ever need one pod for each Minecraft server.

Bask in the glory of the StatefulSet for a minecraft server:

apiVersion: apps/v1
kind: StatefulSet
metadata:
  name: minecraft-server
  namespace: minecraft
spec:
  selector:
    matchLabels:
      app: minecraft-server
  template:
    metadata:
      labels:
        app: minecraft-server
    spec:
      containers:
        - name: minecraft-server
          image: itzg/minecraft-server:latest  # Or specific version if needed
          env:
            - name: EULA
              value: "TRUE"
            - name: MODE
              value: "survival"
            - name: TYPE
              value: "CURSEFORGE"
            - name: INIT_MEMORY
              value: 8G
            - name: MAX_MEMORY
              value: 12G
            - name: CF_SERVER_MOD
              value: "/modpacks/your-modpack.zip"
            - name: ALLOW_FLIGHT
              value: "TRUE"
            - name: USE_AIKAR_FLAGS
              value: "TRUE"
            - name: RCON_PASSWORD
              value: "rcon-password"
          ports:
            - name: minecraft
              containerPort: 25565  # Expose port 25565
            - name: minecraft-rcon
              containerPort: 25575
            - name: metrics
              containerPort: 19565

          resources:
            requests:
              cpu: 4  # Adjust based on expected workload
              memory: "12Gi"  # Adjust based on expected workload
            limits:
              cpu: 8  # Adjust based on expected workload
              memory: "16Gi"  # Adjust based on expected workload
          readinessProbe:
              exec:
                command:
                - mcstatus
                - 127.0.0.1
                - ping
              initialDelaySeconds: 30
              periodSeconds: 30
          livenessProbe:
            exec:
              command:
              - mcstatus
              - 127.0.0.1
              - ping
            initialDelaySeconds: 30
            periodSeconds: 30
          volumeMounts:
          - name: minecraft-data
            mountPath: /data
          - name: modpacks
            mountPath: /modpacks
      volumes:
        - name: minecraft-data
          persistentVolumeClaim:
            claimName: minecraft-data
        - name: modpacks
          persistentVolumeClaim:
            claimName: modpacks

StatefulSets have two main sections; a selector and a template. The selector tells the StatefulSet which pods belong to that StatefulSet.

There are a lot of configuration flags here that are relevant. Let’s start with the basics.

spec.template.spec.containers[].image This is the container image. I really like the server container that itzg created. It’s really robust and has great tooling around supporting all the different modpack variants. I highly recommend using it.

spec.template.spec.containers[].env These are your environment variables. Checking out the itzg/minecraft-server readme will help you choose environment variables that are right for you.

spec.template.spec.containers[].resources These are hints to the Kubernetes scheduler as to the resources your container will use. Requests and limits are different. Requests are hints, limits are actual limits. If CPU exceeds the limit, the container will be throttled. If memory exceeds the limit, the pod will be killed. Limits are optional, Requests are also optional, but strongly strongly recommended so the scheduler doesn’t put too many resources on the same node.

spec.template.spec.containers[].ports This is how you expose a port to the cluster namespace. A kubernetes Service resource will reference these ports later. You’ll note that I have a “metrics” port here. This is not a default Minecraft port, but a minecraft forge mod that exposes a prometheus metrics endpoint that we can query to get server statistics. (I’ll write more about prometheus in a later installation)

spec.template.spec.containers[].probes Probes are healthchecks. These optional configurations tell the system how to check if the server is operational. the itzg container provides a healthcheck tool, so we’ll use them.
spec.template.spec.containers[].volumeMounts[] This is how you attach your template’s volumes to the container, and at what path.
spec.template.spec.volumes[] This is how you tell a StatefulSet to use a PVC, this tells the template spec where to find the PVC.

Network

Networking can be a bit to wrap your head around. Think of it this way: Pods expose ports to the namespace. Those ports can be aggregated into a load balancer’s service. The service then either exposes the port on an external IP, or is accessed internally, by an ingress controller. Ingress controllers typically only handle HTTP services, so we don’t need to use an ingress for Minecraft.

apiVersion: v1
kind: Service
metadata:
  name: minecraft-server
  namespace: minecraft
  labels:
    app: minecraft-server
spec:
  type: LoadBalancer
  ports:
    - name: minecraft
      port: 25565
  selector:
    app: minecraft-server

Here, we see the return of the selector directive. Once again, we’re using labels to filter which pods are attached to the service. The LoadBalancer type is used to distinguish from the standard service (for HTTP sockets). LoadBalancer services use a cluster-level load balancer to expose a virtual IP that multiple physical nodes can respond to, and handle multiple types of packets on.

Uploading Modpacks

Curseforge is a pain in the ass. If you’re using it, you need an API key to pull the server files. Because of that, I choose to download them manually and upload them to the container manually. you can use the kubectl cp command to do this, but you need the container to be running. If the server can’t find the modpack, it’ll crash, so you can’t upload the file. To fix this, just switch the “TYPE” environment variable to vanilla and re-apply your deployment, then copy the file up to /modpacks/, swap “TYPE” back and re-apply the manifest. Once that’s done, the container will pick up the modpack and you’ll be ready to go.

Finishing up

Once you have a server up and running, you can use kubectl logs -f $POD_NAME to watch the live logs from your service. Once the server’s up, try connecting to it on it’s load balancer IP; you can see this by doing kubectl describe service $SERVICE_NAME. For others to connect, you’ll need to expose that IP and port through your firewall, which is outside the scope of this guide.

Disclaimer: I threw this together in about an hour; if you notice any problems with it or have questions, please do comment!

PhaseLockedLoop · March 30, 2024, 12:03am

This is awesome. Im writing most stuff down for me discussing the home lab setup as of now. I basically have the summer to cobble this together before school starts

Thanks for this

SgtAwesomesauce · March 30, 2024, 1:50am

Any time my dude. My plan is to write a few more of these as I find the time. I want to do one to cover monitoring, but that’ll probably turn into a goddamn series. For now, I think I’m going to cover simpler topics.

modzilla · March 30, 2024, 1:30pm

Nice writeup! One thing I would change is the PVC Accessmode. You don’t want to use RWX here since longhorn will spin up a non ha NFS server using Ganesha which will degrade availability and speed. Especially when the NFS pod is scheduled on another node. If you need to access the PVC from another pod you can tell the kube scheduler to schedule the pod on the same node as the Minecraft pod. This will allow the new pod to access the PVC as well even though it’s RWO and not RWX.

SgtAwesomesauce · March 30, 2024, 7:45pm

Ohhh, thanks for the note!

I’m still pretty new to Longhorn, so I’m still trying to wrap my head around how it all works. Do you have a link for docs about how to schedule the pod on the same node as the PVC replica?

I’m assuming this is a taints/tolerations config?

modzilla · March 30, 2024, 11:03pm

Do you have a link for docs about how to schedule the pod on the same node as the PVC replica?

It’s unfortunately a bit buried in the docs…

I’m still pretty new to Longhorn

I guess then it’s not too late to switch CSI driver? In my personal experience (has been some time though, about 2.5 years) longhorn is really slow and replication can lag behind. Also when draining nodes for upgrades/reboots evicting longhorn volumes take ages.

I found the Piraeus Operator a lot better in that case. It’s drdb9 based and does the replication in the kernel. The reads can be done locally and are much better. Since the drbd kmod does the replication it does not suffer from the same eviction problems. Just a thought tho.

SgtAwesomesauce · March 30, 2024, 11:20pm

Oh, those directives… I specifically wanted to avoid playing with that. We do a bit of that in prod at work, but I try to avoid it when possible.

Hmmm, I’ll have a look. Longhorn seems to be working fine for me, but now I’m gonna have to figure out how to migrate a few volumes; I’ve got some critical data on some of these.

I’ve been getting satisfactory performance lately, and as far as I can tell, the replication is acceptable.