Local AI build

alpha754293 · July 11, 2024, 4:46am

My current local AI system is an older platform (Intel Core i7 6700K, Asus Z170-E motherboard, 64 GB DDR4-2400 RAM, 2x Gigabyte 3090s).

At idle, it is sucking back around 70 W.

I was wondering if there might be a more performant CPU/platform out there where when it is waiting for AI requests, that the idle power consumption will be < 70 W, but would still give decent performance when it is answering said AI requests?

i.e. I know that slower processors can be lower power, but that comes at the sacrifice of performance.

I am trying to optimise between performance and power such that it would be the lowest idle power consumption with the minimum loss of performance.

I thought about using mini PCs, but the problem is that a lot of them don’t have multiple PCIe 3.0 x16 slots for my nearly triple-width GPUs, which limits motherboard options.

Any recommendations will be greatly appreciated.

Thank you.

(For reference, I am using the open-webui and running ollama running the Codestral:22b model, so it’s not super intensive AI workload in terms of responding to AI queries.)

drbsg · July 11, 2024, 7:42am

You don’t have a lot of room for manoeuvre here - those 3090s will be idling in the 21-25W range each. That leaves you with a budget of 20-30W for the rest of the system.

progressEdd · July 11, 2024, 8:11am

I was about to suggest the rtx 7000, but then I read the 3090s. Wendell ran a ryzen 7900 on his ai inference rig

But if you are doing most of your inferencing on the GPUs, you can maybe look into Wolfgang’s channel

He did a low power nas build

Unless you were thinking of using the CPU as another potential ai inference hardware? If so, you might want to wait for the desktop equivalent of lunar lake with ai cores

Or for a chinese manufacturer to make a lunar lake desktop board like this

ulzeraj · July 11, 2024, 12:47pm

I suspect even low end modern CPUs will at least get you something closer to the 6700K. And since you are inferencing with GPUs anyway the CPU doesn’t really matter as long as it has the required PCI-E lanes for the 2 GPUs.

TryTwiceMedia · July 11, 2024, 2:21pm

You are looking for the ‘T’ sku intel CPU’s for low power draw
I9-9900T for example (not worth it if you are already swapping boards)

You have slow RAM already so there’s no loss there

At the minimum I recommend a 10th gen, but realistically newer is better and find a T sku cpu

For reference:
Intel Core i9-10900K @ 3.70GHz vs Intel Core i9-10900T @ 1.90GHz

Description: Intel UHD Graphics 630

Class: Desktop

Socket: FCLGA1200

Clockspeed: 3.7 GHz vs 1.9 GHz

Turbo Speed: 5.3 GHz vs 4.6 GHz

Cores: 10 Threads: 20

Typical TDP: 125 W vs 35 W

TDP Down: 95 W vs 25 W

Cache per CPU Package:
L1 Instruction Cache: 10 x 32 KB
L1 Data Cache: 10 x 32 KB
L2 Cache: 10 x 256 KB
L3 Cache: 20 MB

Memory Support: Max. Memory Size: 128 GB (DDR4-2933)

CPU First Seen on Charts: Q2 2020

Overall Rank:
495th fastest vs 840th fastest in multithreading out of 4754 CPUs
410th fastest vs 816th fastest in single threading out of 4754 CPUs
165th fastest vs 270th fastest in out of 1368 Desktop CPUs

Multithread Rating

22880 vs 14822

Single Thread Rating

3134 vs 2469

TryTwiceMedia · July 11, 2024, 2:24pm

Out of curiosity I compared to your I7-6700K and the 10900T is approximately 5x more efficient
Guess Moore’s law is still alive in some ways…

Intel Core i7-6700K @ 4.00GHz

Description:

Class: Desktop

Socket: LGA1151

Clockspeed: 4.0 GHz

Turbo Speed: 4.2 GHz

Cores: 4 Threads: 8

Typical TDP: 95 W

Cache per CPU Package:
L1 Instruction Cache: 4 x 32 KB
L1 Data Cache: 4 x 32 KB
L2 Cache: 4 x 256 KB
L3 Cache: 8 MB

Memory Support: Max. Memory Size: 64 GB (DDR4-1866/2133, DDR3L-1333/1600 @ 1.35V)

Other names: Intel(R) Core™ i7-6700K CPU @ 4.00GHz, Intel Core i7-6700K CPU @ 4.00GHz

CPU First Seen on Charts: Q3 2015

Overall Rank:
1211th fastest in multithreading out of 4754 CPUs
788th fastest in single threading out of 4754 CPUs
380th fastest in out of 1368 Desktop CPUs

Multithread Rating

8929

Single Thread Rating

2505

alpha754293 · July 12, 2024, 2:31am

I’m using 3090s because it’s existing hardware that I already have (trying to spend as little money as possible if I were to buy new hardware to change the system/platform out).

So, according to HWInfo, the 3090s are actually idling at somewhere between 6-8 W. But that’s based on the software reporting.

I don’t have the tools necessary to be able to measure what the idle power draw of the individual 3090s are, so I can’t really tell how much of that 70 W idle power draw is because of the 3090s vs. because of the rest of the platform.

So in terms of the impact of CPU performance on the inferencing speed – it does make a difference because “typing out” the response(s) uses the CPU, but getting the answer comes from the GPU.

So, the 3090 can be really fast at the inference, but when it comes to open-webui “typing out” the response, that’s entirely a CPU-bound task.

Thus, really fast GPUs paired with a really slow CPU still results in a modest/relatively low tokens/second being processed with the CPU being the bottleneck.

Would an AMD CPU be better for this task/use case or would Intel still fair better here?

(I haven’t done much testing with running the AI workload with an AMD based system.)

(I forget if I mentioned this above, but the reason why I am running two 3090s is because I need the combined total of 48 GB of VRAM to be able to fit the TheBloke/Mixtral:34b model (IIRC) as it takes something like 34-38 GB of VRAM, so a single 3090 with only 24 GB of VRAM isn’t enough (was getting out of memory errors), and again – rather than buying a different GPU with more VRAM in it, I was just re-using what I already had and I was able to split the model up between the two 3090s to leverage the distributed VRAM. The cards drops down to PCIe 3.0 x8 rather than being able to run at PCIe 4.0 x16 each, but I don’t have a Threadripper system which would give the cards the PCIe lanes that they’re due.)

Thank you for the replies everyone.

Please keep 'em coming.

drbsg · July 12, 2024, 10:33am

Mea culpa - I was taking the reading of mine with the machine idle, but with it driving a 4k display.

ulzeraj · July 12, 2024, 11:56am

I might be wrong. I don’t think you can even find a modern low cost general purpose CPU that can wire 2 3090s where running the web frontend would be the bottleneck. I wouldn’t overthink on CPU other than required PCI-e lanes.

alpha754293 · July 13, 2024, 4:49am

No worries.

It’s all good.

(I mean, it could’ve been possible that HWInfo is not reading the power consumption correctly.)

So…my testing on this is a little bit limited because I don’t really have very many systems that can take multiple PCIe x16 devices and still have enough power for everything else.

(i.e. my HP Z420 workstations, I forget if it can even physically fit two 3090s in there, and even if it could, there aren’t enough power PCIe 8-pin power connectors to be able to power said two 3090s. My only other system that can do that has a 6500T in it, and it also doesn’t have enough RAM to be able to run some of the larger AI model (I think that it tops out at 16 GB).)

But in testing some of the smaller models, I noticed that the GPU barely sweat, but the CPU is frantically trying to “type out” the responses, which results in a lower token rate, even though the GPUs barely even budge.

This is why I think that my CPU might be the bottleneck as a LOT of other people (tech YouTubers, Wendell, etc.) are using VASTLY more powerful CPUs than I am; hence the question.

Like it would be one thing if it was fielding questions from multiple users all day, every day, then sure; idle power consumption really isn’t a much of an issue, but if it is just me that’s really using it, and I’m at work for 10 hours a day, and then on parenting duty after that; it means that most of the time, the system would be spent idling.

igormp · July 13, 2024, 7:35am

Plug one - or both- out of the desktop and measure the difference.
My 3090s on linux each idle at like 20~50W each.

I don’t think you can go much lower than what you have now. Would the cost be worth for saving just 10W when idle?

alpha754293 · July 30, 2024, 8:58am

So, HWinfo reports that it idles at somewhere between 6-8 W each, which if it is anywhere remotely close to it being accurate, would peg the GPUs at 12-16 W total out of the 70 W that it is consuming, at idle.

Therefore; that would put the rest of the system/platform at somewhere closer to 54-58 W at idle, which means that if I can get the idle down to the sub-10W range for the rest of the platform, I’d be saving between 44-48 W, which doesn’t sound like it’s a lot, but it’s also not nothing neither.

If there is a low enough cost system/platform, it can be worth it, as that will mean that I will be able to run the system 24/7, vs. right now, where I only power it on when I am actively looking to use said local hosted AI system.

igormp · July 30, 2024, 2:47pm

One important thing for you to notice is the reported power consumption by your software vs what your PSU is actually pulling from the wall. It’d be interesting if you could get one of those wall watt-meters to see how much your system is actually drawing at idle.

alpha754293 · August 5, 2024, 5:37pm

The 70 W @ system idle power consumption is measured with a Kill-A-Watt power meter.