Threadripper : many HW & SW questions

Hi gentle people that live in the intertubes and heat chips all days long,

Ze plot :

Applying some HPC guidelines to a clustered file server for ~450 users (85% of files into [257-128k]):

  • Use 3 SRVs:
    • FS: ZFS,
    • DFS: GlusterFS,
    • HBAs w/ broadcom chipset,
    • RDMA 40 Gb from Mellanox (adapters & head switches)
  • No fancy stuff (such as overclocking).

Ze hardware :

H1- Threadripper 1950X ?

H2- As the gain of 2950X is only 100 MHz over the 1950X and performance difference is far from a break out, put €200 more for that on the table seems a bad idea ?

H3- 2990WX is of course tempting, but as 2/4 dies access RAM through another die, as it is an energy hog and as it only runs @3.0 GHz (ZFS loves high clock), it doesn’t seem to be a good competitor, especially @€1,650 the beast ?

H3b- As the 3 SRVs will also run some daemons (apache studio, nginx intranet sites, etc), will the 1950X be sufficient to cope will all that ?

H4- As AMD tables show that RAM speed goes down when the machine’s stuffed (8 sticks) and I left HW alone for the past 15 years, is it possible to have the CPU running at it’s regular speed and the ECC RAM run at it’s nominal speed (2,666 MHz) WITHOUT any risk of unstability ?

H5- Motherboards aren’t legion but I have a particularly bad souvenir of ASRock, that wasn’t even consummer grade 15 years ago - from what I read here and there, it seems to be the same today, especially upon BIOS upgrade/reactivity ?

H5b- Which motherboard would you advise about ? (criterium #1: Stability, #2 (if possible) ability to run the RAM at it’s nominal speed, #3 must have at least 3 × M.2 slots)

Ze software :

SW1- Does running the whole shebang under Linux avoids the need of ADM special software to modify the CPU cores behavior ?

SW1b- if not, doest it run under wine ?

BTW: How can I correctly tag this post (did not find any setup), is it automatic/moderator only ?

Thanks in advance & regards.

Jiff

1 Like

Your post seems to build up for something big, a huge list of issues and questions, then it just dies. I am very confused and a little worried.

I don’t see any reason why it would become unstable. For 24/7 operation, you might want to look into Epyc (probably maybe worth the effort to consider it?).

Personally, I had a very pleasent experience with ASRock stuff. Asus and HP on the other hand were nightmares…


What tags do you want?
Not that I am mod or whatever, but as a “regular” I can tag to hearts content.

What are these 450 users going to be doing, are we talking about home directories or some other use case?

What’s your current setup for those 450 people?

Arfff, no I was talking about tagging my post to: Linux, Threadripper, ECC RAM, Motherboard and may be Obiwan Kenobee :smiley:

Hi Jiff,

At the top of the thread, you can click the pencil icon next to the thread name and choose your tags from there.

There is a field called “optional tags”

@risk

Oops, that I forgot to talk about :confused:

2 main uses:

  • PXE Boot OS (DRBL + NFS + CTDB)
  • $HOME (GlusterFS clients for transparent failover)

My 3 servers will be replicated, using mirrors vdevs.

The current installation is EOL (mostly w$7 and duno which w$ servers as it is not my problem, except for the users’ data), so it will gradually give way to the Linux one - there are 5 servers, 10 Gb network bonded, and… that is all, as I do not have the specs.

I’ve got one, it’s great.

There are more benefits to the 2950x, but I can’t remember them off the top of my head.

Personally, I don’t think it’s worth it, but you should buy current-gen hardware for enterprise use.

I can’t imagine you’d need this much power for a fileserver. I’ve used ZFS and Ceph in the past and you should be fine with a 16core, depending on your total storage count.

It really depends on what you’re running and your expected traffic. Are these 450 users all concurrent and constantly requesting data on the intranet sites?

I don’t remember off the top of my head. I would not be using threadripper for this application, personally. I’d go EPYC or Xeon.

I’ve got the X399 Taichi and it’s perfectly fine.

Depends 100% on what you do with it and which tier the board is.

Again, I recommend EPYC. Get an SP3 board.

If you really want to go TR4, pay the ASUS tax and go with them.

What, specifically, are you talking about here? Are you talking about power-saving throttling? If so, that’s kernel. If you’re talking about overclocking, don’t. Overclocking is the antithesis of stability. If you’re talking about Numa, that’s a combination of kernal and BIOS.

1 Like

@MazeFrame (strange that forum, that doesn’t reorder posts by threads)

H4- About EPYC, that was my first intention, but after several days of readings I flipped to THR as it will also be a first step to move on (but alone this time.)
The only thing that could bring me back to EPYC would be 128 GB being too low for these uses or the machine too slow (I don’t take great risks as the R&D will be more than happy to put his hands on THR if I have to revert to EPYC.)

H5- Good to know as it is generally cyclic, just like HDDz.

what is your total storage capacity requirement?

1 Like

Epyc also as a LOT more PCIe lanes.

1 Like
  • 450 persons (no evolution)
  • ~6,800 GB data current
  • estimation is ~x3.5 for the next 5 years but docs will add a lot (I don’t know the volume at this time) and as more and more mod’op come in vidĂ©o and people want to have it at hand, the final estimation is “above 50 TB”, let’s say 75 TB in 5 years, may be 100, as one of the goal is to suppress archiving to have everything online (long term operations.)

I don’t need more lanes, servers are headless, there are 2x network adapters + 2x HBAs, some SSDz for the system, the whole thing in a SM case with big redundant PSU and that is all.

Two X722’s and that is 16 lanes for network, two big raid controllers and another 16 lanes populated.
Given, 4 GPUs/compute cards would use twice that. Just something to consider.

1 Like

H1- Swell !

H2- Well, as both models can be installed on the same MOBO, expenses will stay low.

H3- OK, that is what I was more or less thinking but a confirmation is good.

H3b- Not constantly, except may be for the accounting department as all docs are scanned, other than that, the regular files use (mostly word processing & spreadsheets), plus a bit of doc retrieving (datasheets in PDF format.) So, not really a heavy usage per se but as usual some peaks when people are doing this kind of things altogether.
The database part of the intranet where most of the heavy traffic dwells is remote.

H4- here it is :
DRAM DRAM DIMMs per Total Speed
Channels Ranks Channel DIMMs in MHz

Quad … Single …1… 4 of 8 … 2,667
Quad … Single … 2 … 8 of 8 … 2,133
Quad … Dual … 1 … 4 of 8 … 2,400
Quad … Dual … 2 … 8 of 8 … 1,866

H5- OK.

H5b- Well, I really want to go with it as far as it can take the charge and last 5 years, but if you really tell me: DON’T, I won’t and will back to something like a mono 7371. The primary goal is to have an easy to go system, the secondary is, once achieved and settled, to leave the company and propose smaller companies with a top of the shelves but cheap system.
BTW, what do you mean by “pay the ASUS tax” exactly ?

SW1- Yup I was of course speaking about power-saving throttling. So no w$ mandatory software is a good thing.

Two Mellanox 40 Gb network adapter ConnectX-4 Dual ports: 2x x8 lanes = 16 lanes
Two broadcom HBA: 2x x8 lanes = 16 lanes
Total: 32 lanes
Most of the last MOBOs offer 4x x16 lanes PCIe slots.

Asus is more expensive. People buy them due to their better support and slightly higher QC. That’s the “ASUS tax”

The idea is that if you were to buy an ASUS board, there’s a better chance of you having good support experiences than, say, Gigabyte or MSI. That’s not so much the case anymore, but ASUS and EVGA have been the two who have always had first-class support IMO.


If you want 2667MHz with official support , you’ll need to go epyc.

I’ve got quad channel 4 dimm on my 1950x and it’s running at 3200mhz, but it’s not ECC.

Yeah, I don’t know of any hardware that needs proprietary software (with the exception of Nvidia) to handle downclocking properly.

that’s not true.

X370 (1800x) and X470 (2700x) provide 24 lanes.

the 9700k provides 16 lanes.

Only HEDT provides more than 24 lanes.

2 Likes

OK, so things do not have change too much in the past 15 years and this is a good thing :slight_smile: (I was selling HW then.)

I don’t really need official support, though as far as I can run 128 GB ECC RAM @2666 MHz with a TR4 without any failure (and ECC is mandatory for ZFS.)

I know one (although is is not question of clocking): my MSI CX61 2PC laptop, 'cos since I replaced the fan, it is starting/stopping every 2s on low temperatures - I strongly suspect that, as the replacement has a smaller engine, it’s inductance is lower and MSI made an RLC circuit with it to especially sell USD50 fans (I bought 3 replacements for USD32, shipping included…)
To avoid this annoying noise, the only way is to re-program the temperature that triggers the fan start above 52°C, and at this time, only the w$ programs can do that :frowning:

How good that support really is is to be debated.

On what platform?

TR4

Whoops, my mistake: I did not check manufacturers sites before and most of french salers’ sites are lying :frowning: advertising 4x PCIe x16 for almost any TR4 MOBO
when the reality is (when 4 slots are filled): 2x PCIe x16 + 2x PCIe x8 most of the time.

Anyway, this doesn’t matter, as my needs are: 4x PCIe x8 (I’d prefer to have 4x PCIe x8 and 4 more M.2 connectors.)

https://us.msi.com/Motherboard/MEG-X399-CREATION/Specification has 4x PCIe x16

1 Like