Epyc Naples 32 core or Rome 12-16 core

Hey There, decided to stop lurking in my Lerkim in hopes you guys can point out any gotcha’s on a CPU purchase.

I’m building a data server that will moving a lot of data quickly so i need PCI lanes. I could possibly use Threadripper but i may run into memory limitations (officially) in the future. I will admit that I’m new to server hardware so i could be missing some obvious things.

At this point this server is a trial it looks like a 12 or 16 core Rome (7302p) would do the job 16 core may be better for memory bandwidth then a Rome (7272) 12 core but the 12 core is better for the budget :slight_smile:

My supplier doesn’t seem to like AMD servers/parts they stock lots of Intel and in recent times have gotten rid of most of their AMD server parts. When i asked them about bringing stuff in they have very politely avoided the question. On the plus side they have offered me a very attractive price on a Epyc 7551 32 core that they have on the shelf which is cheaper then any 7272 that i have found online.

I have already been doing a bit of research it seems the main difference is memory speed, core speed, and L3 cache. It seems like # of cores on a Naples 32 core would overcome the core speed difference on multi-threaded workloads vs a 16 core and 12 core Rome. Obviously single threaded workloads may suffer a bit.

Is there anything else i should watch out for?
Can Naples RAM run any faster then 2666?

Appreciate any help you can provide.
dfi

Processor isn’t the only thing to consider, especially if you plan on moving a lot of data. So, what else is in your system? What’s the path the data is flowing: internal/external, if external, via what link (LAN network speeds, etc) as your data will only move as slow/fast as the slowest device in the link allows.

Totally agree that the processor is only 1 part of the puzzle.

This is a proof of concept build. Project requires SHA extensions so a zen based processor is the base of the build. The basic operation of the system will be in 3 steps, initially this will all be done on one machine. To scale up later, steps would be moved onto other systems in this scenario data would transferred between machines on 10GbE connections

  • step 1 load a compressed block from outside the local LAN (1GbE), block decompress to approximately 500GB, preprocess block (CPU & GPU)
  • step 2 process block (CPU only)
  • step 3 close block (primarily GPU and some CPU) compress and transfer to storage for later retrieval

steps must be done in sequence

Basically the high speed storage is going to get worked hard with the uncompressed data, debating between larger capacity ssd’s and relying on wear leveling to slow the eventual destruction of the drives. Or getting enterprise drives with high dwpd (drive writes per day) rating for more predicable wear. This will probably be the best choice assuming we can get them in my my part of the world for a decent price.

The basic hardware framework at this point is:

  • 256GB ram
  • 2 x blocks of High speed NVME/SSD storage
  • 1x GPU (2080ti, 3080, 6800xt)
  • array of HDD’s for storage

Hi

First off Would you not benefit from upgrading your NIC to 10Gb or better yet use two of them :wink:

Storage: to only use NVME gen4 in single or raid-0

GPU - do you need a high end Card here, is there any processing going on other then to display contend? If you do need GPU, maybe consider multiple units

Henrik

One more difference that hasn’t been pointed out is the PCIe compatibility between those 2 generations of chips. Naples is PCIe Gen 3 capable whereas Rome is PCIe Gen 4 compatible.

And of course, you have Zen 3 based Milan EPYC just in the baby steps of launching right now. They are shipping to large scale customers and will make it to retail shortly.

How big are the blocks of data you have to process?

Would it be possible to do your processing as a stream?

I would store incoming data to a ram drive, process from there to local SSD/nvme, then move from there to storage.

Yes the high end card is required for opencl calculations 10gb+ of ram seems to be the most efficient, trying to find another 2080ti but not having much luck. The work can be done by the CPU with no GPU but the CPU overhead increases dramatically there is a better speed bump with the GPU/CPU combo.

The next step in the roadmap is to add another machine to do step 2. Then add another GPU to the server to so that we can parallel process 3 blocks at the same time (each step would be working on a different block)

It would go something like this:
blocks come in sequence so initially only one block can be worked on at the beginning.
Once “block 1” has been moved to step 2 we can start step 1 on “block 2”.
Then "block 1 moves to step 3, step 1 can begin on “block 3” while step 2 is conducted on “block 2”

I think we may be asking to much of the server to do 1 & 3 at the same time but it will most likely still be faster than waiting to do it in sequence

Blocks are approximately 500GB uncompressed 30 - 60GB compressed. Doing the work in RAM is something that was talked about, but haven’t seriously considered it partially because none of our current machines have enough RAM.

That’s a good idea something that will have to be looked at :+1:

1 Like

This topic was automatically closed 273 days after the last reply. New replies are no longer allowed.