Super Fast data processing machine

Hi Guys,
I’ve been assigned a challenge and I knew from @OptimumTechYT you are the best. I’ve done my homework and I need your expertise/experience to tell me if this will go right before paying a big amount of cash. In short, We need to build a super super fast computer in data calculation. So the fastest storage on the planet with the biggest size possible… We’ll use m2s over PCIe, The top CPU, the max RAM, best GPU and there is a couple of challenges.

First one, This machine must support the latest Thunderbolt. That is the easy part.

The second one, all storages must be over RAID 5 or 10 or whatever is better option am open for suggestions… :grin:… If you are interested in advising please let me know.

So, Thunderbolt is currently supported on Intel only. So no AMD CPUs.

The hard part is that there is no longer one fastest CPU, GPU or best RAM or storage for every workload. It all depends on what you are doing.

This has some very useful discussion-
https://www.youtube.com/watch?v=D3ZNHM5y3vU


Here are some questions to ask yourself to figure out what type of high performance you need. This is by no means an exhaustive list, just some that I put together in a couple minutes.

-How muli-threaded is your workload? Does it use a single core? A couple with one main thread? Does it scale to 20 threads? 64 or even hundreds of threads?

-What about cache? Does it like lots of L2 or L3?

-How much memory are we talking? Does 64 or 128gb work, do you need 1tb+? Does amount and latency of storage effect ram needed? What about ram latency and bandwidth?

-How much storage do you need? Are we talking 1-2-tb of useable storage needed? Or do you need more then that? Is throughput a high queue depth or low queue depth or latency or some combo thereof?

-Can your workload take advantage of GPU acceleration? If it can, then it effects what you should buy quite a bit. Would it use CUDA? Does it work better with a Tesla or a Quadro?

3 Likes

Well, from the general vibe of your post, I am going to assume that you have an unlimited budget for this machine. In that case, I would recommend waiting for Intel’s recently announced 28 core Xeon W-3175X CPU. This will be the world’s fastest CPU for a good while, in pretty much any workload. Couple this CPU with 512 GB of RAM, and some Optane-based DC P4800X NVMe SDDs, and you will have no issues processing “massive amounts of data” (whatever that means) at insane speeds.
As for GPU(s), you probably want Volta V100 based card(s).
Between the SSDs and GPUs, you might have to juggle with the number of available PCI-E lanes (44), depending the number of GPUs you want.

1 Like

it depends on your workload?
are you zipping a bunch of files,
or are you compiling Linux kernels?
do you need single threaded, multithreaded.
is it small threaded workloads like databases?
are you serving webpages?
is it for gaming?
is it for watching 144fps porn?

point is serving webpages using 32 cores is a waste, gaming on 32 cores, is a waste, compiling linux kernels on 32 cores, is yummy.
Serving webpages to a billion users on a mechanical harddrive suuuuucks, serving webpages to a billion users on a M.2. is nice.
and so on.
saying a pc is fast is like saying a formula 1 car can drive fast, but do you really wanna drive it to work?

3 Likes

Agree with the other replies. We need to know more about the software and workload to help else you might buy a sports car to do the work of a truck.

4 Likes

Agreed.

If you can’t talk about the specifics of it, at least tell us if your workload can take advantage of avx512, CUDA, etc…

2 Likes

There absolutely is more information needed.
Something else to consider: Does it have to be a single machine or does it make sense to build two (or more) separate machines with fast interconnects. You could have thunderbolt on one “small” intel system and then connect it to an Epyc or Xeon server (or several) with something like InfiniBand.

1 Like

Probably a cluster is best? A cluster of cheap machines is cheaper and can out run one good machine. Don’t underestimate threadripper. Thunderbolt on threadripper is a thing as of titan ridge controllers.

I have some ingest stations setup for hotplug nvme drives … it’s for video processing but the act of ingest producies proxies and the whole 9 yards

25/2x25 or 100gb Ethernet is also effective for data on/offload

5 Likes

I would aim for threadripper … since you want thunderbolt you’ll need the new Titan ridge controllers which you can get in a add one card. Probably toss in the highest gigabit Ethernet you can do unless fiber is an option. I’m sure that 128gb of memory is probably more then enough. The m.2 idea is great especially if you are on threadripper because it provides a lot of pcie lanes and allows you took for other devices… Keep your options open it’s easy to go Intel for easy access to thunderbolt but there are quite a few workloads that threadripper is good and better then Intel at…

Best gpu… depends on what your doing tbch… data crunching prolly Tesla or high end Quadro just saying

1 Like

Power usage would suck

1 Like

Would a Xeon Phi be in the direction?

1 Like

you are correct smp processing seriously outstrips single machines in processing speeds every time (even more so if you are using fibre optic networking)
depending on your data transfer and what you plan on doing with it. be it renting out processing time to Seti or any of the oceanographic institutes.
My choice would be clusters!

@mutation666:
thats a trade off to consider the cost of specialty top of the line hardware, slower performance than a cluster, and colossal failure when the single unit dies on you.
if a node on a cluster fails the processing speed may drop slightly but will not adversely affect the rest of the machines.

here is a pdf on cluster computing.

look at page 25 performance for a 64 node sun cluster 1.3 terraflops in speed!
and this is an old document.

2 Likes

Thank you guys for your participation, I haven’t used forums since ages so I was kinda waiting to see an email alert of your responses and didn’t check here. Please accept my apology.

Ok, so I see a lot of good ideas and questions, I’ll explain further and try ma best to answer your questions:

The need:

So, basically, we are looking to have a desktop machine (which means it shouldn’t be a blade server), that can process big data in seconds, data like reports from various monitoring tools and the software is built-in-house to do some complex formulas and do actions based on that. The number of data calculation is huge.

Current findings:

So we’ve done a test on a workstation where we made a partition out of RAM and tested on that, it worked to the acceptable data calculation speed, but however, we were limited to 30gb of storage.

based on that we decided to build our machine with as much as possible of PCI m.2s as these are the fastest we could get. and a Thunderbolt connection to eliminate the data transfer bottleneck (this is negotiable).

The second thing we looked at to have that much of m.2s is a motherboard with a CPU that can have that much PCI slots and a CPU that will calculate the best.

@Methylzero
The GPU is not important, a single 2080 is best to never think about upgrading it at anytime soon.
Also yes I don’t have a limit for my first unit, but in case I’ve been requested to build more I need then to think of budget efficiency :smiley: .
512RAM is a good Idea.

and yes, we need as much as possible of RAMs. cause we’ll use these also to increase the amount of storage.

@Lauritzen
Thanks, as I mentioned above it is a huge an amount of reports/data every minute will be collected from several sources, processed and actions/orders sent accordingly. hope that answers your concern

@ anotherriddle
I’d say Xeon is the best, but I didn’t see a Xeon that will fit the other specs and the performance is not huge in this case, what do you think?.

@ wendellGreat setup for the ingest, i definitely wanna know more on the performance of video processing out of that cluster, but in my case, it has to be a single desktop machine.

based on your input, I can see the options are either wait for the next Xeon to accommodate 512GB-RAM, then plug in as much m2s as possible. Thunderbolt connection to clear the storage as quickly as possible, probably dumping into a cluster/ SAN!

The other part of you are highly recommending

AMD Ryzen Threadripper 2950X Processor

Corsair Vengeance RGB 128GB (8x16GB) DDR4 3600MHz C18

Samsung 960 PRO Series - 3x 2TB PCIe NVMe - M.2 Internal SSD (MZ-V6P2T0BW)

ASUS X399 ROG ZENITH EXTREME - 4x PCIe x16, 2x USB 3.1 Gen2, 8x USB 3.1 Gen1, 3x M.2,
or
GIGABYTE X399 AORUS XTREME – 5x PCIe x16, 1x USB 3.1 Gen 2, 8x USB 3.1 Gen 1, 3x M.2, Fusion RGB, WiFi

MSI - GeForce RTX 2080 Ti 11GB GAMING X TRIO Video Card

Corsair - 1200W 80+ Platinum Certified Fully-Modular ATX Power Supply

A quiet case! maybe

NZXT - H440 (Black/Red) ATX Mid Tower Case

I still don’t know how will RAID 5/10 be set on that setup, I need to figure that out.

Let me know if you see this setup is the best combined and if it would do what we’re looking for.

Thanks you guys for the help

You won’t be processing big data on a desktop machine of any size in seconds.

How much data are you talking about?

just a heads up:
You have to remove the space between the “@ anotherriddle” to send a message to the corresponding user, like so: @anotherriddle and have a space afterwards.

Also, we are still kind of guessing about your workload. We’d like to give good answers and recommendations, but those guys that have the in depth knowledge (I don’t count myself here) need as many details about your workload as you can give us.

How big and how many entrys (order of magnitude)? what kind of processing? Database lookups? Semantic search? Is your workload memory bandwidth starved or do you just need low latency? I think Epyc currently supports the highest amount of memory. Depending on the details 3d XPoint SSDs might make more sense than conventional nvme SSDs. I am also a bit confused about your GPU answer. Can you leverage GPU acceleration? It sounds like no, so I am not sure what you need the GPU for?

Also, you can take a look at


They test a lot of different workloads and publish their results. Maybe you can find a similar workload and look at the types of systems they recommend.

Would you mind telling us what was in this system? What CPU(s)? GPU(s)? This information would be useful to figure out systems that would perform well.

If you really do not need a GPU at all, then why spend money on the RTX 2080? Just get an RX 550, much cheaper, and you get good open source drivers, no nvidia driver headaches.
If you really want to get the largest amount of RAM possible in a “desktop” computer, you could try to get an E-ATX sized, dual socket AMD EPYC board from Supermicro, that could give you 2*32 cores and 2TB of total RAM. (total system cost will be astronomical though, tens of thousands of US dollars)
https://www.supermicro.com/Aplus/motherboard/EPYC7000/H11DSi.cfm
But be aware, that current AMD CPUs have much lower AVX performance, so if your software can effectively make use of AVX instructions, then Intel is probably a better choice for you.

1 Like