Daily Driver + SQL OLAP Dev Build With Threadripper 3960X

EricM · August 18, 2020, 7:35pm

I’d like to get some thoughts from more knowledgable folks (I’m a developer). This is an expensive build. My goal is to have decent single thread speeds for daily stuff + raw horsepower for a 50GB OLAP db I’m developing.

I’ve settled on a 3960X and Gigabyte Designare. 64GB of RAM should be enough, but I’m not sure on 4x16 or 8x8 config and posted a separate question here:

Threadripper 3960X Memory Question - 8x8 vs 4x16 Overclocking

I’m watching Wendell’s Threadripper mem vids, but I don’t understand if it’s better to go with an 8x8 or 4x16 configuration for the best RAM performance with 64GB. [image] I see you can get better clock speeds populating 4 dimms, but won’t 8 dimms give you better throughput? This will be a daily driver machine (gaming) + OLAP SQL Server dev box. Disk IO speed is really important to me, so I want the best infinity fabric speed possible. My thought is to find kits with good timings at 1.35v, so I can bump the voltage if needed. This is the 4x16GB kit I’m looking at: It’s really hard to find 8x8 kits rn, so it’s either two decent 4x8 kits or four superb 2x8 kits: The board is the Gigabyte Designare. Full parts list here: https://pcpartpicker.com/list/k7kXjp

The two mobo NVMe slots to the CPU will host Windows on a 1TB 970 Evo Plus and the DB log files on a 512MB 970 Pro. The AORUS Gen4 AIC Adaptor will host the DB data files on a RAID 0 with the 4x 2TB ADATA SX8200 drives.

Fast IO with sustained write speeds is really important. I’m constantly refactoring the DB, adding a column to save some computed value to a 10GB table.

I got a used Titan Xp off ebay for hardware acceleration. I’m going to play with some machine learning stuff.

The Lian Li 011D XL case - PC Part Picker is warned me that the case does not fit the XL-ATX board, but 2 out of the 3 Designare builds they have use that case. I know the XL world is non-standard, so I’m hoping its a nothing burger.

wendell · August 19, 2020, 5:10pm

4x memory sticks can generally faster than 8x in terms of infinity fabric so stick with 4 if you can.

You know if you really want to hedge your bets and your data isn’t super sensitive why don’t I set you up with remote access to a threadripper and something less threadrippery.

For your use case you might be surprised by how much mileage you can get out of a 3900x at 25% of the cost potentially. We could actually rest it.

Plus I wouldn’t be surprised if 3rd gen tr goes on sale soon. With 4th gen desktop ryzen looming. I bet they launch 4th gen tr around March next year +/-

EricM · August 19, 2020, 6:23pm

Thanks Wendell.

I can’t wait because I have 100% downtime right now. Trying to turn lemons into lemonade by throwing myself into high risk/reward project while waiting for something stable.

Core count is not going to be a bottleneck so 3900x is fine. This is “compute once and store”. I want to get a 10 hour refactor (Raid 0 with 4x HDD and 6 core Xeon at 10% util) down to a 30min coffee break.

That’s why I’m interested in the extra CPU lanes of the TRX40:
OS - M.2 x4
Log Files - M.2 x4
Data Files - AORUS Gen4 AIC Adaptor x16 (Raid 0 with 4x SX8200 NVMe’s)
CUDA - Titan Xp x16

With X570, I’d be limited to this, right?
OS - M.2 x4
Log Files - M.2 x4 via Chipset
Data Files - AORUS Gen4 AIC Adaptor x8 via Chipset (Raid 0 with 2x SX8200 NVMe’s)
CUDA - Titan Xp x16

That would save $1,500 (https://pcpartpicker.com/list/HpxbBZ + $120 AORUS Gen4 AIC Adaptor).

BGL · August 19, 2020, 9:27pm

Interesting project

For your described SQL Server workload make sure you take the time to get your instance and database/data-file layouts correct else you will be wasting that hardware. You’ll need multiple data-files per file group with pre-sizing, instead of relying on auto-growth at default growth rates (which are terrible).

For max SQL Server performance low-latency storage and plenty of RAM are the priority. CPU performance less so - all things being equal fewer cores at higher clock-speeds with more cache usually wins the day unless your database is serving hundreds of concurrent user queries. I’d imagine a 3900X will be all the grunt you’ll need.

EricM · August 19, 2020, 9:57pm

Yeah, it’s 50GB total with ~20GB being cache worthy, so 64GB of RAM should simultaneously handle Chrome and SQL.

I’ll pre-allocate 1TB of space. I am curious to see if SQL’s striping (four drives with four files in one filegroup) is faster than AMD’s storage controller (four drives in Raid 0 with one file). I saw no measurable difference when I tried it with spinning rust back in the day.

Thinking more about a 3900x…what if I put the GPU in the chipset’s x8 slot and the NVMe addin on the CPU’s x16? The chipset is only x4, but I think I rather have 4 NVMe’s than 2.

BGL · August 19, 2020, 10:21pm

So here’s the thing, when you are inserting, updating or deleting records in a table remember that you are not working directly against the disk. You are working on pages stored in RAM that will get flushed to the disk when either a checkpoint occurs (approx once a minute) or the lazy writer gets invoked due to memory pressure and flushes the dirty pages. If there is no memory pressure the only activity on disk that really affects performance is what’s happening with the TLogs - which is because that’s what is able to help you recover after a system failure to roll committed transactions forward or incomplete ones back.

For that reason you could probably place your data files on spinning rust and it wouldn’t harm overall performance too much provided your TLogs (and possibly TempDB) are on low latency NVMe disks and you have more RAM than SQL really needs.

Where having data files on low latency disks really helps is if you’re constantly under mem pressure and swapping pages, or when you are running index rebuilds, backups or dbcc checkdb commands.

This maybe way you haven’t seen much difference in file layout perfmance in the past and it’s why my DBAs are asking me why they are not seeing 6GBps throughput on our new DWHouse server

BGL · August 19, 2020, 10:31pm

I’ll also add that your better off giving SQL 4 or 8 data files for each dB because then it can read or write to each with a different thread. Again this makes the most difference when you are doing someyhing like a full table scan, backup or consistency check.

wendell · August 20, 2020, 1:05am

I would probably do x8 GPU CPU lanes, 4x nvme from CPU, and 2x nvme on chipset lanes.

I don’t think you really need 4nvme because with sql you can do a 4 wide stripe on a single nvme and still get perfect scaling. To get that 5gbps throughput out of the fastest nvme, you actually need a lot of threads going at once, in parallel. And the cheaper ssds slow down as they get full because they switch from using the entire drive as SLC buffer to tlc or qlc depending on model.

That x4 4.0 link maxes out at 8 gigabytes per second. Dual channel memory maxes out around 100 gigabytes per second. For compute tasks the x8 GPU link is fine.

You could do x8 GPU, x8 nvme, 1x nvme to CPU and 1 nvme to chipset and have zero nvme bottleneck in a 4 way setup.

Amds nvme raid doesn’t scale super well so I’d probably not use that.

But really I think just a total of two nvme for now, good nvme, is all you’d need. That’s enough to saturate 8 threads easy.

EricM · August 20, 2020, 9:25pm

The bottleneck is the sustained disk write speed.

It’s a single-user, OLAP dev box. I get a monthly import from the OLTP db. Everything is immutable, so insert only. Then I turn those skinny tables -> minimally logged fat tables -> simple recovery full backup. That’s not the pain I’m trying to solve.

The dev part is in turning those fat instance records into normalized measures that describe behavior over time. I’m trying to find a needle in white noise. They’d pay me a fat stack of cash if I can drop false positives from 95% -> 50%, but that may not be possible. Most iterations yield a half percent bump.

On spinning disks with 32GB of RAM, I’d get to work ~1hr per day and hope it finished the new iteration by morning. That’s suboptimal for working a complex problem. There was a ton of TempDB allocs/reads/writes, logged inserts into tables with multiple low fill factor indexes, and a final defrag.

I’m eager to revisit the project with NVME’s, GPU computing, and newer SQL TempDB features. I don’t see this ending in another draw. Win or lose, I will finish a project that’s been rolling around my head for a decade.

Speaking of TempDB:

TempDB benefits from more data files, but user DBs just care about the number of drives:

https://docs.microsoft.com/en-us/archive/blogs/psssql/sql-server-urban-legends-discussed

In fact, new versions of SQL automatically create up to 8 data files. Plus there’s a ton of other perf boosts, like being able to preallocate the data files with instant file init, they won’t shrink during restarts, reusable “cached” temp table pages, optimized memory and log usage, etc.

EricM · August 20, 2020, 9:39pm

That’s the answer then. I don’t think I’ve seen a query execution plan with more than 8 parallel tasks. If I remember correctly, it maxed out at 4.

I should prob go with a x570, 3700x, a 970 Evo Plus, and 3x 970 Pro’s. Put Windows on the Evo Plus, the two Pro’s in the expansion card for data, and use the 3rd Pro on the mobo for the log.

Just out of curiosity…

I didn’t think MB vendors could split the x16 lane on X570 boards? Only the first slot went to the CPU and the rest were via chipset. Plus, you can only put two NVMe’s in the adaptor when it’s in an x8 slot. It doesn’t do any switching.

wendell · August 20, 2020, 11:21pm

No depends on the board. There are several that are x8/x8 to CPU via x16 physical slots then the bottom slot is x4 via chipset. No board I’m aware of maps x8 slot lanes through the x4 chipset.

BGL · August 21, 2020, 9:55am

That’s a very good point; I have spent so much time dealing with large server systems with complex storage sub-systems that splitting larger databases up across multiple data-files has always resulted in noticeable performance benefits for certain workloads. That will teach me to type off the cuff without thinking it through properly

system · May 22, 2021, 3:55am

This topic was automatically closed 273 days after the last reply. New replies are no longer allowed.