The video bellow was my inspiration for this post. Well, it’s mostly a response to the video.
I am going to bully @geerlingguy a bit here, because my autism kicked in, and I can’t not talk about the lack of redundancy. I mean, does it count as bullying if you are criticizing ideas, not people?
I had an idea for an enterprise setup running on a Pi cluster, but my (proposed) implementation is completely different than Jeff’s. It may probably end up being way more expensive too, so take that into account as well. I will be proceeding with it, when I finish my most urgent priorities.
The setup Jeff did works, as proven in the video, but it is far from being reliable. First off, the Turing Pi 2 (TP2) is a single point of failure (SPOF). We, sysadmin folks, talk a lot about redundancy and keeping hardware (and software) running and we proud ourselves with our uptimes. Well, I now prefer frequent restarts and verify that updates don’t break stuff, as I get farther away from data centers and into the home server stuff, but that’s besides the point.
The TP2 (toilet paper 2?) has a lot of SPOFs that could go wrong:
- single PSU
- single Pi that takes care of the routing
- single Pi that takes care of the storage (the ZFS pool NFS server)
- single UPS
- single network connection
Again, while not perfect, it gets the job done and given the same budget, you could probably figure something out with normal SBCs and a switch, but you would run into most of the same limitations.
How do we get around those limitations, you may ask yourself, curious creature. Well, the answer to this is “add more hardware” at the problem. Those are all hardware limitations.
So, what we can do is get another TP2 configured the same, right? Well… maybe. With 2x TP2s, if one of them fails completely, you still get the second one running. That would be basically a standby node and you would pay for 2x 4G plans and have redundant internet.
But there’s a catch. If the network connection between the 2 TP2s dies (like, if you have just 1 eth port connecting each other), then you may get into a situation where both believe they are the only one alive and try to be the active nodes. Even if we use the 2nd eth port in a LACP, balance-alb or active-standby configuration, you still have this risk, although the chances get significantly lower (maybe one of your switch chips on one of the TP2 board dies to trigger this event, but then only the main Pi CM4 will still be able to talk to the reverse proxy, not the rest of the nodes).
So, a workaround with the TP2 would be to get 2 switches (stackable or not, doesn’t matter) and connect each TP2 eth port to each one of the switches. If the switches are stackable, do a LACP or balance-alb, if they are not, just active-standby. Connect the switches together and connect a port from each switch to a router that has the 4G modem on it (or any internet connection). If you want redundancy, you may complicate the configurations with 2 routers, both connected to each switch. Either using keepalived and synced routing and firewall rules for Linux, or CARP for BSD. And you may connect a group of 1x TP2, switch and router to a UPS, and the second group to another UPS.
So now, we can have a k3s cluster split between 2 physical nodes. With 2 master nodes and 6 worker nodes, we can do quite a lot. We could even load balance them and if one TP2, switch, PSU or UPS dies, it’s no biggie, the other one will take care of it and launch additional containers. Because the TP2 build is self-contained, it doesn’t need resources from the other side.
But we’ve come pretty far and we didn’t even ask questions on the way. Is a TP2 good for such a situation? Shouldn’t we aim to do better? A TP2 should be around $200, with each SO-DIMM adapter for compute modules being around $10, bringing the cost to around $240, unless you use Jetsons.
With $35 CM4s x8 and TP2 x2, you raise the build price to a whopping $760. Not bad for a redundant cluster. I’m not taking the PSUs, UPSes, switches and routers into account, because those are shared components between Jeff’s modified workaround build, and my plan for a build.
We can get 2x Odroid HC4s for 2x $73 (so $148) and 6x Odroid N2+ for 6x $83 (so $498), bringing the total cost to $644. So around $100 cheaper, give or take, but instead of being solely dependent on the TP2, we have multiple boards that work together. The 2 HC4s can serve as the two NFS shares and master nodes, while the other 6 N2+ boards can be the workers. But the cost may be a bit offset by needing to buy switches with a few more ports than what you could get away with the TP2.
There are tradeoffs to this approach though. Jeff has a 2x 2U setup, but of which cases are likely way more expensive than my jank idea, but I guess the TP2s can work outside a case too, just like I plan for the SBCs. But neither would be very portable anymore.
For the software side of things, I will give it a deeper think when I actually build the thing, because right now I am uncertain if I should go with LXD or with any k8s stack. I prefer the classic way of managing services, so LXD makes a lot of sense to me, but even if I change my mind after the fact, I can run a k8s stack inside LXD, but not the other way, and I’d have to migrate the k8s containers into LXD then figure everything out. It’d be a bit of a headache, so it’s likely I’ll go with LXD.
But, one software that I would definitely change, at least in my own infrastructure, is to not make a SSH tunnel, but instead use wireguard, do some keepalive checks and if the tunnel is down, restart it. That way I can avoid doing TCP handshakes twice, for the SSH tunnel and for the communication between the reverse proxy and web servers. And speaking of reverse proxy, I’d go with HAProxy. I may use nginx as the web server, but as a reverse proxy, I don’t like it that much. The wg tunnel would obviously be running on the router.