PC build for data processing

So I've got a project where I'll be processing data on the order of billions of data points. I'll be crunching the data using Python (probably Pandas, unless I find something easier to use), and if this project goes well, the PC will be working long hours.

What I want to know is what type of cpu would be good for this work? I have no intentions of using the PC for much else; it's mainly a work machine. Can I get by with a normal i7 and a some good hard drives? Or is this where it might be good to go up into the Xeon territory? Or are there other processors out there that I don't know about?

This line of work is very new to me (if you can't tell), so any information you could give would be nice just so I can get familiar with the territory.

Edit:

Budget for the build is <$2000.

i would go dual xeons for this. whats your budget? and can you wait for zen?

Maybe look at something like THIS.

Those are older chips but they still are monstrous. And they are cheap.

Is this something that can be GPU accelerated or is this pure CPU lifting needing to be done? I would suggest Xeon's right out of the gate if you can afford them. ECC would probably be very nice to have when crunching this many data points. Just to prevent bit-flipping that would fuck with your data.

Okay can we please not go around recommending older used server parts to people that don't know even what their work load requires. Like if the dude doesn't even know as much as how to pick his own parts I think we can safely assume its probably not a great idea to throw a complicated old server board riddled with quirks and oddities at him.

This ASRock board has no quirks or oddities as far as I can tell. What do you mean?

Zen's not coming out til next year, right? Gotta have it before then. Let's just say that the budget is $2000 tops, might go up. I'll edit that into the main post.

I'll be reading a ton of excel and csv files and making simple comparisons between them. The tasks probably won't be complicated, there's just SO much to go through. I'm not sure if GPU acceleration would be good for this; I'm unfamiliar with it.

Some of those server boards do weird things, 30 second to minute posts, spotty ram compatibility, no official support for windows past 7 on some of them, among other things that I have seen and heard about. I've been looking into building one of those old server board based workstations and a lot of the super micro boards I was looking at actually didn't support anything past windows 7 service pack 2, although I might have been looking at boards a tiny bit older than your asrock.

Excel isn't going to use GPU acceleration, your probably best off with a single E5 xeon with a higher clock speed over tons of cores for that kind of workload. With a good amount of ram too, I'll whip something up real quick.

It takes a bit to boot, that is true. But how often are you doing that on a machine like this?
All the other stuff is simply not true for this board. I ran some benchmarks on windows 10 just for lols.

There are turd boards out there and those are the cheap ones. But that is true for every socket and every generation of mainboards. So next time before you go

maybe just ask.


GPU won't help you here. Cores vs. clockspeed? Depends... Do you have a test workload? Can you open more than one instance of that process?

Maybe they're not true when in regard to your board, but to some of the other 2011 dual socket boards those points are definitely applicable. I was looking on ebay at old server boards and poured over manufacture pages to find which boards actually supported which OS's, and some do in fact not support 10 in any official standing. Some were weird with 8 as well. These are server boards, so there are some boards which don't play nice with some memory. That's not debatable, ECC memory isn't always as plug and play as the un-buffered non-ECC stuff most people plug and play into their PC.


http://pcpartpicker.com/list/htTLPs

This is what I came up with. Some Excel workload's love cores, and some love clock speeds. I choose the Xeon E3-1245 V5 as it has a high base and boost clock of 3.5ghz and 3.9ghz respectively. It also supports ECC, which is nice to have when you are talking about a machine that is going to be running for potentially hours crunching numbers. That would not be a nice time to have a bit in memory flip during that process. Besides that the build is fairly plain, a nice workstation board with a PCIE M.2 SSD to load programs and worksheets incredibly fast. I'm not quite sure if you really even need as much storage as I have included, probably not really. You can customize this as you see fit. Besides that I have a fractal design define R5 because its quiet and nice, plus a 80+ platinum rated power supply to ensure the best possible power going to your components as well as something that can be trusted to run for hours on end without need of worrying.

You mean the ones no one mentioned besides you?

Do you mean opening and analyzing multiple data frames at the same time?

A VERY VERY small workload might be this.

This is large for the data on the website, but small for the stuff that the computer that I'll be working on. This is just stuff I'll be going through as preliminary.

Thx a ton! I definitely like the looks of this build, and will run it by some others and the powers that be to see what they think.

Cool! I hope they enjoy it. I find it a good balance of the amount of cores you get and also the speeds at which those cores will run at. Good luck!

Nobody said he'd be going out and buying the exact same board as you have, so I do find it important to put the note out there that SOME of those boards do have the quirks I listed. That's why as soon as I started listed quirks off I put that "Some" word all over the place, because some boards are better than others for a desktop use-case. Not to mention that a lot of online data points to the fact that excel actually uses only a single core for most operations and may not scale well past that kind of 8 thread count in others. Although admittedly the data online for excel is weird because of there being multiple generations and optimizations changing generation over generation.

Yup, can you run a lot of those at once? That determines if it makes sense to go core crazy or clock crazy.

So, since nobody said he'd go out and buy the exact same board that you suggested, I find it important to put the note out there that SOME Gigabyte boards are just horrible. They suuuuuck so hard. Not all of course but some do. That is why as soon as I wrote this post, I put the word "Some" into it, because it is always legit to generalize when the word "Some" is used doing so.

You see how that makes no sense?

Well actually I did recommend he go out and buy that exact gigabyte board so no it doesn't make sense to go and say that same older gigabyte boards are bad. My point in putting that caveat there is because there is a large percentage of those all boards that have some kind of weird quirk. I mean look at wendel's dual 2011 build, his Asus motherboard didn't like to use his RX 480 unless he used a legacy mode for the video device in the bios. There are weird things and hardware configurations that can be finicky with those old boards. Just because you didn't experience them doesn't mean that his hardware needs wouldn't come without some oddities. Its not nearly as guaranteed that everything will play nice when you are talking about a platform that is as old as 2011 now is.

Since nothing seems to help...

You recommended a parts list. So did I. Specific parts. And this was your answer:

You just painted over my recommendation with the broadest brush you could find. Later on you added the word "some" into your comments. And despite all my attempts to show you how generalization is dumb, you are still doing it.

I already built two of those E5 2670s systems. They work flawlessly. Because I know what I am recommending.
And what you call a quirk or an oddity is simply a matter of researching compatibility. Lists for that exist for basically every mainboard in existence. Are they all odd?

The OPs hardware needs are still not really clear. But good thing you already told him what to buy.
Especially the 500,- dollar in storage will surely help with processing data, right?

Yeah, whatever. I'm out of this thread.

@sidenote, when you have more information on the workload, send me a message.

Not like I put the caveat in my original recommendation that he should change the storage to what he needs, as maybe he would benefit from having a very large ssd to store all his projects on or maybe all he needs is a small ssd, or maybe just an ssd boot with a ton of raid protected hard drives. Maybe he actually will have his projects stored on a Nas somewhere. Who knows, that's why I put the note in:

He needs a system built to play with excel, most of Excel is single threaded which is why I suggest a chip that has very high single core performance as well as a decent amount of threads to deal with the smaller variety of workloads that will use many threads. Its a pretty good cover all solution.

No they are not all odd, which is why again, I put the Some caveat on every explanation of why some of them are odd. I, admittedly, forgot to put that in my first response and should have. Doesn't defeat my argument though, not all hardware configurations are going to play nice on all server hardware. You just said look at what I did, not a parts list. His requirements may need him to build something different than your setup, which is why I think he should know that hey, not all of these systems run flawlessly. Congrats you had two that did, doesn't guarantee anything for him. Hell he could buy the same motherboard and get a different bios revision from an eBay seller and not be able to post without doing all kinds of different troubleshooting.

Its also not a great look to go to his new employer and go hey look at all these old server parts off of eBay, this is what I want you to spend your money on to get me up and rolling.