Yup, I missed that (see my answer to MazeFrame.)
Well, thank you @SgtAwesomesauce and @MazeFrame , you made me re-think about the whole stuff and I now have to heat the calculator to get a precise idea of the final costs and see if the price delta is worth the chances.
At first sight, Iâd say itâs a yes (I can easily recycle this gear), but Iâll come back with the calculation, which may take time, as it is looks quite hard to find an end user price for EPYC 7371.
EDIT: https://en.wikichip.org/wiki/amd/epyc/7371 shows that memory BW is way higher than TR4 - so the issue is packed Thanks to all of you.
Somehow, Iâm thinking that hardware is a much smaller worry than software setup, it feels like youâre jumping into something more complicated than you understand or have experience with (just based on wording in your posts, but Iâve been wrong before)
How are you going to qual new versions and roll them out gradually such that not all users are hit at the same time and such that thereâs time for new software to soak?
How will you be handling off-site backups?
I think youâll need more machines, just to be able to have some redundancy and run all those things youâre planning in VMs and containers
to echo @SgtAwesomesauce
If this is for file servers, iâd be going for EPYC 7351P (cheap) with 128 PCI lanes. Or even a 7251 8 core.
Any of the threadripper or epyc line will be plenty of CPU power for file serving (Seriously, even an 8 or 4 core box will likely be 99.99% idle most of the time) - BUT Epyc will get you far more pci-e bandwidth that you can use in future for high speed M.2 SSDs (and high speed network adapters to pair with them for user-facing network or clustering).
Iâm not sure on motherboard availability but i think EPYC is far better suited for fileserving than Threadripper purely due to lane count. The 7351p really is the ultimate SAN read head CPU. 128 lanes!
I think youâll run into PCIe limits with threadripper in this application much sooner than you think - especially as SSD prices drop and appetite for throughput increases. It might be OK TODAY, but think a few years ahead.
edit:
ALSO!
(Advice from a long term admin)
In this application i would strongly suggest buying off the shelf EPYC Boxes from a vendor. DO NOT build your own if work is putting up the cash (and given the requirement(s) it sounds like it is for something serious - and the data is probably important, yes?). Because YOU will be copping the blame for anything every time it breaks and warranty/support will be more hassle (it will be YOUR problem).
Seriously, look for a 7351p based system from HP/Dell/Etc. DO NOT build your own. Itâs not worth the risk/pain.
Yes, it is (slightly) more expensive up front, and maybe youâre trying to âdo the right thingâ by scrimping to build your own - but you will have a SINGLE SOURCE of support with business-grade SLA. You wonât be trying to source random bits and pieces from over the internet or playing hardware diagnostics games when it breaks. Youâll ring the OEM, say âits fuckedâ and then it is their job to fix it. They will have the diagnostics tools and spare parts available for prompt replacement. Youâll also likely have hot-swap bays, OOB management, redundant PSUs, FANs, environment/hardware monitoring, etc. All that stuff you simply probably wonât get with a self-built X399 box of parts.
The support on a storage system like this is paramount. DONâT be the mug left holding the can if it goes pear-shaped.
Home lab? Test environment? Sure⌠build from parts. Production? Not worth it.
Also⌠tweaking RAM/CPU speeds⌠just donât. In this application the CPU/RAM speed will be largely irrelevant. Youâre chasing PCIe lane count and connectivity mostly. Base clocks for stability will be plenty fast enough! EPYC has massive cache, so RAM speed for this stuff simply wonât matter.
also: re: @risk
agreed, software is going to be a massive potential for failure as well (even if you lessen the hardware burden by buying off the shelf epyc boxes). As well as support; if you get hit by a bus (or say, want to go on holiday). Consider looking at what actual storage array vendors have to offer - because whilst you will pay, you will also get enterprise storage features and support from people other than YOU.
Unless you are, or plan to be a 100% full time storage admin (and even then), you may well be biting off a lot more than you expect. People donât generally get fired for buying a reputable storage array. People left holding the can when something goes wrong with a custom-one-off poor-manâs san⌠sometimes do.
^this. Supermicro has some nice storage offerings.
Could also do that in a DIY box, but then you will have bought the server in pieces instead of ready to run
Ehh, with these more advanced SAN features, CPU starts becoming a requirement. Iâm not sure about Gluster though.
Donât underestimate Supermicro! We used them for our prod clusters at my previous company and their support and warranty was excellent! Across approx 200 units of 2u dual 2670v2 systems, we had two PSU failures in two years. As soon as one failed, we sent them an email and they advance RMAâd us a replacement, even though we had spares to keep the servers running on two.
As far as other hardware failures, we had something wrong with a motherboard on one of our storage servers, bringing it down for the count. They shipped us an entire new chassis with CPUs and ram instead of asking us to swap it. Just swapped the hard drives and we were good to go.
Doesnât half the server world (like, big guy servers) run on Supermicro boxes?
I donât know. Would be interesting to see sales numbers.
Supermicro is pretty good when it comes to value proposition, so I wouldnât be surprised.
Supermicro is kinda unknown/not locally available here in AU where i am, so thatâs why i mentioned DELL, etc. But YMMV. Point being - use an OEM with local support.
Yeah, i know you CAN do that DIY, but less likely with threadripper and if you buy the bits and pieces to do OOB management youâre likely going to
- lose a PCIe slot
- not get the same level of support from the OEM anyway
Server class hardware isnât particularly cheap up front, but it WILL save your ass, and trying to replicate it from bits and pieces the price gets pretty damn close anyhow.
@thro
You missed my last edit : this âissueâ (coming from an old cluster builder reflex) is closed, as the memory BW is awfully slow on Tr4 compared to EPYC, 25 or 50GB/s on one side, +150GB/s on the other, which is mandatory with FS/SW using very big caches.
About (new) machines, I could build them (even w/ X399), 'cos despite what you think, you can find all the good parts to DIY d° as a builtin, but itâs not worth it considering in this case you must spend money on spare parts, just in case - so I was considering what supermicro has on itâs shelves from the beginning, as it is very well distributed and supported here.
No ! RAM speed is essential when youâre dealing with large caches (ZFS is a real hog) + multiple fast network adapters + many disks read/write at once and (once again for ZFS), CPU speed is more important than multiplying cores - this is why Iâm ogling toward 7371 SP3, which is faster than others.
AND this will be of vital importance in the very few years (may be even months, IF marketing stays out of the way) to come, as rust & SSDz are already dead and buried - see: https://www.servethehome.com/carbon-nanotube-nram-exudes-excellence-in-persistent-memory/ and https://www.servethehome.com/fujitsu-nram-production-to-start-in-2019/ and notice that the well known waffer engraving techno will allow for any reasonably skilled founder to dive in (pay real attention to access and data retention times.)
About risk(s ;-p), do you seriously think that people selling support did not take them before you to be up to date ? (well, some do that, but they wonât last very long.)
There are always risks and sometimes you have to take them, providing you correctly balanced the benefits you can get.
Note that Iâve also seen so called pro support being stuck with on-the-shelf answers and unable to debunk serious issues (that were finally solved by seasoned admins.)
In this configuration, itâs requirements needs CPU as Gluster has to keep a large DHT up to date plus a state of data across the bricks.
I agree with you on SM.
ie: here, as support phone is overtaxed, there a law saying the line must be hanged up after 20â, which leads to an awfully painful service from Dell, as you do not have a designated engineer and youâre almost always obliged to repeat your speech 2 or 3 times, but the 3rd time the line automatically hangs up and when youâre calling again, youâre connected with another guy that is not aware of the problem.
Moreover, distributors often have the necessary knowledge to avoid a direct call to SM.
@risk
All software upgrades are qualified by the lab before going public and even in the worst case, where if it missed something really important, rolling the system back doesnât take very long, thanks to ZFS intrinsic qualities
Anyway, I wonât take more chances that I would for my own company.
Cool. Also youâll lose some data sooner or later, so long as youâre happy and can afford to deal with that, all is well.
That is arguably the dumbest thing Iâve heard in my life.
Hmmm⌠I canât say if the CPU is enough then. Youâll have to just try it.
@risk
? Do you really think there will be no backups ¿¥
Criticizing only for the love of controversy never have build anything on its ownâŚ
Well, there are things for and other against, as usual - this came after several odd affairs, such as a dumped girlfriend that kept the keys and came back while her ex-boyfriend was in vacation to dial the Japanese talking clock that doesnât hang up on itâs own (+âŹ70k invoice), so this law came plus another to limit a phone callâs rate to a maximum ; there was also rogue people linking calls to overtaxed abroad lines for more that $40/minute. Plus many events like that I read here and there but do not recall exactly.
My personal contention about that is that happens almost only because there are way too much laws and rules (too much state workers with too less to doâŚ)
On the other side it gives what I said, but here, this is a Dell dedicated problem as all others I know of work normally: you have your own engineer that know you and all your gear and is smart enough to redirect you toward a co-worker when the problem is out of his league (and the phone line is a regular one.)
For the CPU, Iâll ask a supplier for 2 demos (mono & dual CPU), the one I think about like this as it gives him ammos to sell HW to other customers.
As usual, our Aussie cousins are making things⌠down under ;-p)
BTW, with the high temperatures you have, Iâm curious to know how your home computers behave if you donât have A/C ?
Build them with proper cooling and no issue?
Very few people here do not have some form of air conditioning.
No, but thatâs never stopped anyone from losing data either because itâs recent and not backed up, or because backups are misconfigured, or because restores were never tested by the end user or because the data has to be kept in sync with some other place thatâs not under your control. In all of these cases itâs likely youâll be blamed for causing someone to miss a deadline (corporate machismo makes people look stupid if they accept responsibility for anything bad ever)
@thro
Never mind, donât know why but I thought A/C wasnât very much used in Australia (for my defense, Skippy wasnât equiped!)
This is a risk in each and every installation, but a risk that is mitigated by the use of GlusterFS (redundancy) and ZFS (regular automated snapshots.)
No, as it must be as fully tested as the rest.
Since E.Snowden revelations, dotcom closure and other kindnesses, my trust against any cloud (that wasnât very high anyway, as I have a brain, a lot of imagination and seen weird things) has definitely fallen down the negative part of the thermometer - for this reason, there will be 3 different locations locally where servers and backups will be scattered and no external booby-trap.
This I doubt, as I made things crystal clear from the beginning and I always record what is being said, which day @ what time and by whom in a nice (black) notebook with numbered pages and indelible ink, just in case somebody would be tempted to say: âI never said thatâ - unnerving for others but fully efficient ; thereâs usually no more than one attempt, two for the dumbest.