Can do.
Blockquote
When talking cluster and hyperconverged infrastructure, the approach is to scale out/horizontally…meaning more (small and cheap) nodes. You are paying the (huge) network latency tax anyway, so 3 or 10 nodes doesn’t make a difference.
True, and you also create redundancy. While maybe one machine is more power efficient you don’t have redundancy (people are talking about one hypervisor).
Blockquote
I think 64G on really limited nodes are plenty. If you need more, you need to scale up/vertically with better hardware (larger form factor) or scale out with more nodes.
Make sense!
Blockquote
Oh lawdy, if only you saw the guide I’ve got cooking.
Homelab?
I have to go to work. Thanks all for the great input!
if you are planning on that much memory, you may want to look into the announced but not yet shipping AMD 8004 series, sienna, it is an epyc lite. it is about 2/3 of the epyc for 1/3 the price.
Anandtech has the price breakdown on the CPUs:
The 8 core starts at $409
Serve the home also has a writeup.
If you are using picopsu, Most of them use as input 8v to 16v.
You use a car battery as your ups.
13.8v power supply >> car battery >> wires directly to picopsu.
The average car battery is 70AH, so you have 12*70=840WH of battery before things turn off.
If you are using several mini desktops, they are probably using 60w at peak, 6w at idle.
You will probably get more than a day out of a car battery, If that is insufficient, get a few extra car batteries. If that is still insufficient, get a gas generator that you can trigger to remote start when the batteries get low.
Are you sure you need 64gb per box? You may want to try less with a faster ssd, and let it swap what it doesn’t need to keep in ram.
Unless you are running physics simulations or AI workloads, most tasks don’t need much ram. And tasks that do need that much ram often also need a faster CPU than you will economically find in a mini-pc.
I’m sure nothing can go wrong there. I recommend 3-2-1 backup strategy with emphasis on “1”
It’s basically EPYC Rome with updated Zen4 power efficiency and Gen5. CPUs are already available but boards are still missing
I’m really considering to get Siena, SKUs up to 24 cores are reasonably priced. Very good TDP (8-core is @70W cTDP which is unheard of) as far as servers are concerned with all the server goodies you ever want. Not as much perf/watt as e.g. 7900/7950, but way more IO and memory. I/O die certainly taking it’s toll to make all those lanes fly.
I see Siena as a more premium approach for home server if desktop platform just doesn’t have the lanes required. But it’s certainly not ultra-low power or competing against MiniPCs.
Well, a LiFePo4 battery might work there, but I was thinking more along these lines only it is a UPS too. No idea if it exists in the market currently though, or if that one is sufficient to power a ton of cluster machines, but…
I’m not sure a UPS is needed. Power is pretty stable here. (i’ve had 1 outage in the last 5 years)
You said you’re going to run web apps. can’t you dockerize it? Kubernetes is quite a bit more efficient than proxmox. and you have much less overhead and thus lower power use. Proxmox clusters is more for compute stuff that really needs whole virtual machines and adds quite a bit of overhead. Also with a proxmox cluster you need at least 2 computers running out of 3 for it to work. Whereas a kubernetes cluster you can have everything running on 1 or multiple machines and you could even rent something like a linode node to take over some stuff.
(this is all with the caveat that your web3 stuff runs easily in containers)
Your applications will also probably have a lot less overhead if you run them in LXC containers, this could save a good amount of power.
I started with a 3-node HCI cluster based on J5005 Mini-ITX Atoms from ASrock some years ago using oVirt, the upstream variant of RHV or Redhats variant of vSphere.
The aim was to functionally test a zero software cost replacement of vSphere for lab use, using left-over production servers (HP-G8 at the time), and some workstations.
oVirt likes to use 16GB just for the management engine, so I upgraded those J5005 Atoms to 32GB DDR4 each, added a 1TB SATA-SSD and RealTek 2.5Gbit USB NICs, because I have a cheap 10Gbase-T switch (actually NBase-T) as backbone and the onboard 1Gbit NICs would really slow things down and also had some issues (they’d cease to come up on soft reboots, needed a power cycle).
Turns out the management engine actually needs much less RAM for a small setup like this, so I can run a fair number of VMs using 2+1 (2 replica + 1 arbiter) or 3 replica setups: The latter is easier to manage but costs write speeds as data is written thrice, whilst reads could in theory profit. But GlusterFS is no speed daemon no matter what, at least I’ve never lost data, but done plenty of heals.
oVirt 4.3 and later 4.4 are full of bugs, just getting it to work has taken me ages, but once you get to know it, it’s much more autonomous than Proxmox.
Too bad the product is essentially dead, because Redhat has stopped the upstream project. I am currently running the commercial (yet zero license cost) 4.4 release from Oracle, which has far fewer bugs and is really quite stable. Unfortunately there is no way that Oracle will actually put real money into the product, so after some exploration of Xcp-ng (where it took a long time to get all NUCs work), I settled on Proxmox, which just runs out of the box not just on the NUCs, but also with my big CUDA workstations, which I dual boot as compute nodes with GPU pass-through under RHV/oVirt or Proxmox control. Again, getting Zens to work with Xcp-ng was a lot of effort by their phantastic support team, but it also highlighted the risk of their Xen niche over the broad support of KVM.
For me the USB3 2.5Gbit NICs from RealTek have always worked well, no crippling driver bugs I couldn’t fix (on EL7 I had to compile drivers, but that went away with EL8), no overheating, actually very little power use, years of 24x7 operations without issue. I’ve tried 5GBit Adapters using Aquantia chips from QNAP, but due to USB3 protocol overheads they hardly perform faster, are way more expensive and really like to suck power. In short, those USB3 2.5Gbit NICs are so cheap, you can just go and try yourself: there is far too many old tales out there and things have gotten far better, while my experience with Intel drivers has gone the other way.
I then started experimenting with NUCs when I saw a NUC8 sell cheaply: boy, what a speed devil compared to these Atoms! I saw another NUC10 with six cores sell at nearly the same price and grabbed that. Months later a NUC11 with the faster Tiger Lake SoC completed a full 2nd cluster.
I did experiment with Thunderbolt networking, but because the NUC8 and NUC10 only had a single TB3 port, I had to use the dual port NUC11 as a router: not exactly a fault-tolerant setup, but good enough to study the concept.
IP over TB mostly required larger packets to achieve good speeds when routing, no problem using 64K as packet size with TB and then both bandwidth and latencies were fantastic, more like Infiniband than Ethernet. Bandwidth never exeeded ~1GByte/s as the IP link only seems to use a 10Gbit/s allocation from Thunderbolt. I don’t think it’s bad drivers but a limit of what the protocol supports and only PCIe devices lanes mapped through TB seems capable of getting a 40Gbit slice. IP over TB seems a lot more common with the Rotten Fruit Cult, where I rarely venture, but that 10Gbit/s limit to IP over TB is feedback I gathered there.
The biggest problem turned out to be that TB ports do not have a 48-bit MAC address, so the driver randomly invents one to impersonate something Ether and port assignments would just as randomly change on cable changes and reboots, ruining my carefully crafted /etc/hosts files and routing entries.
So I went with Aquantia based TB3 10Gbase-T NICs for the NUCs, which have worked fairly trouble free (zero driver issues on any Linux I’ve thrown at them), but aren’t cheap. With all those external adapters and cables it’s certainly not an OCD compliant setup, but they all live peacefully operated in some open metal shoe cubbard I found on Amazon underneath my desk where I only ever venture for a bit of vacuuming. If I need console access, I have them connected via a set of cheap cascaded desktop KVMs, but I have those mostly just because I already need slightly more expensive ones to operate my workstatations at dual 4k.
Yeah, I also still dream the NUC/TB dream, but it would require some type of plug-and-play protocol in the background, to make NUC-2-NUC networking seeamless and practical.
I’ve transitioned from RHV/oVirt + Gluster to Proxmox + CEPH on the NUCs, and actually both clusters “talk” to each other in a way, as I’ve made the Gluster storage from the Atom based RHV cluster available to Proxmox. I’m pretty sure it would also work the other way around, RHV/oVirt also supports just about any type of storage known to Linux.
Proxmox has tons of historical quirks, but compared to RHV/oVirt it is extremely light, which can be good, when you’re just running a basic three-node cluster. But once you’ve gotten used to the smarts and comfort of a full management engine, and your setup involves dozens of servers and scores of VMs, you may be yearning for a mix of both, like I do.
Really cool to read your story, and helpfull too! Thank you.
So more people use Mini pc’s or NUC,s.
I think I forget the thunderbolt part, but 2.5Gb or 10Gb networking is on the list.
Personally I’m very satisfied with Proxmox. I must say I’m not a power user yet. I’m using it now for a couple of months.
I need a hypervisor because I run nodes from several projects. This can’t be ran in a container. I don’t now if you know the Flux project? Have a look on runonflux.io.
I also run a Docker VM with some apps for personal use. Kubernetes is on the list to try, and learn.
Blockquote
Are you sure you need 64gb per box? You may want to try less with a faster ssd, and let it swap what it doesn’t need to keep in ram.
Yes, the specifications for one node are 2 cpu, 4 threads, 8 Gb mem, 220 SSD storage.
Blockquote
if you are planning on that much memory, you may want to look into the announced but not yet shipping AMD 8004 series, sienna, it is an epyc lite. it is about 2/3 of the epyc for 1/3 the price.
I’ll have a look! Thank you for sharing.
Blockquote
You use a car battery as your ups.
I already have APC UPS in my rack. In only need to replace the batteries in the upcoming time.
Yes I saw several video’s and forum post about the little machines. The only problem they are a bit hard to get here in the Netherlands. Patrick does really cool things too on/in videos/website.
Thank you for sharing your experiences I appreciate it!
Not sure if this fit it the more power efficient solutions. What do you guys think of this?
I already have a Unifi Aggregation switch (10Gb).
With the insane collateral, this is more a ponzi scheme than a business opportunity or anything else it claims to be. Even without collateral, ROI and maths are not really favoring anyone deploying nodes, especially not with high energy costs.
Very bad and risky deal as far as I’m concerned.
Proof of work and GPU mining + “climate friendly” statement on the main site…
Well, ASICs are the most power-efficient things you can build btw.
Something to that effect, yeah.
Yeah I was afraid for such reaction… I’m active for a long time in the crypto space, and you get used to such reactions… A lot of people haven’t a clue where they talking about if it’s about crypto.
It isn’t about crypto, it’s about economics. Paying collateral without dividends or interest is a bad deal. You are lending someone else money for free. I get better return in buying federal bonds and not running any servers that use power or generate write-offs etc. And especially now where interest rates are on the rise and inflation is a thing.
And that is without taking any aspect of “crypto” into account.
Most people don’t know about Crypto. But some people know similar things.