Building a budget AI - Machine learning system

jeeperforlife · March 28, 2024, 11:49pm

I am building a budget server to run AI and I have no experience running AI software. I’m thinking starting with Llama LLM, but would like to get into making AI pictures and videos as well plus who knows what else once I learn more about this. I am just getting into this and have not received the hardware yet but it is ordered. I’m just gathering information so I know how to get started when it gets here.

System specs:

Dual E5 2686 V4 (32 cores, 72 threads total)

128GB ECC RAM

2TB Gen 4 NVME SSD

(4) 1TB SATA SSDs in RAID 0

(4) Tesla P40 24Gb cards (uses the GP102 chip, same as the Titan XP and 1080TI)

I’m planning to run this headless and remote into it. This is just for tinkering at home and I’m not worried if it isn’t the fastest system in the world.

What would be the best OS?

What drivers are the best to use with the Tesla P40 cards?

Any other thoughts on this setup, or suggestions?

Do I need to use NV link on the cards in order to use all the VRAM?

I am thinking of using bifurcation and running each card on 8 PCIE gen 3 lanes, Do you think that would cause a bottleneck?

turtlepower · March 29, 2024, 5:39pm

What drivers are the best to use with the Tesla P40 cards?

To get nVidia’s official GRID drivers you have to sign up for their Enterprise account, and they don’t allow personal email addresses. I attempted with two separate “my own hosted” domain name emails and they still denied me. But if you go to the Discord server found on this post and then check the pins, you can find where to get the drivers.

To elaborate a bit more on drivers (assuming Linux), you’re either using the nouveau open source drivers or the official nVidia ones. But if you want to use vGPU you have to use the nVidia drivers, which technically requires a license. But you can follow the process in this link to get drivers that work (at least for Proxmox/Linux).

wertigon · March 30, 2024, 7:46am

I hope you are aware your Tesla cards are each about as powerful as a 3060, only redeeming feature is the 24 GB of VRAM as opposed to 12 for the 3060.

Your CPUs are almost as outdated with about the same operative score as an Intel Core 14400.

Not that it matters much if you already got the hardware, but I would invest in a 14600k + RTX 4070 combo. That should be slightly more powerful for $1200-$1500 on brand new supported hardware.

MazeFrame · March 31, 2024, 1:20am

40 PCIe lanes on the E5’s though, vs 20 on the i5

Mind you Nvidia aggressively limits FP16 and FP64 on their home-gamer products.

MazeFrame · March 31, 2024, 1:24am

Depends. Proxmox or XCP-ng if you want to make this box do “all the things”, Ubuntu or Debian of you want to serve just as an “AI-workstation”.

Consider BTRFS or ZFS instead of Raid.

235SAS · March 31, 2024, 1:38am

P40 has terrible FP16, a lot of people choose P100 over it even with the lower VRAM just for better FP16. The other thing is much older CUDA version and thus no support for nice things like Flash-Attention. It’s still a capable card, but it is definitely showing its age.

igormp · March 31, 2024, 3:53am

FP64 is basically non-existent in any non-x100 chip, be it tesla, geforce or whatever. Last generation that had meaningful FP64-capable hardware in all chips was Kepler.
Not like it matters, FP64 is pretty much irrelevant for ML (which is the point of interest for the OP).

About FP16, even if they limit it, it’s still many times faster than pascal since it has no support at all for it (or rather, has support at 1/64th of the fp32 speed). So a 3060 is almost ~70x faster than a P40 in FP16, while also being able to make use of the smaller data size to reduce vram usage.

MazeFrame · March 31, 2024, 9:52am

Wow, okay there N…

Anyone can confirm this guys success?https://www.reddit.com/r/LocalLLaMA/comments/13n8bqh/my_results_using_a_tesla_p40/

wertigon · March 31, 2024, 1:10pm

If you are concerned about FP64 buy AMD - not only do they have the performance crown on that, and has had it for years, they support it on all cards. However, that does not matter much since FP64 has no real world applications yet, pretty much.

As always, it is a matter of costs though. This system would probably be around 25-30% stronger than the system outlined above, and do it at a more comfortable power level:

If you remove the 2TB OS drive and the mirrored 4TB SSDs, you could realistically go below $1.5k. Does the OP system beat this? No idea, but there is a reference point to what an equivalent modern system would cost you.

And that is important, to establish references right? Otherwise, of course a potato system is sufficient to every use case under the sun, yessire!

235SAS · March 31, 2024, 4:01pm

I don’t think looking at raw compute is a reasonable reference. Xeon platform makes it a lot more reasonable to run 4 GPU like the OP is doing (I said reasonable, because I know that some people have considered using risers traditionally used for crypto on ML/AI workloads). The other thing is a single 4070 is only 12 GB of VRAM, people still buy the P40 because it is a cheap way to access 24 GB of VRAM (I have also personally considered both the P40 and P100, but chose to hold off for now). You might also be surprised how cheap you can get some of the V3/V4 Xeon systems, I know I was which is why I bought 2 of them.

Edit: Wanted to add some references to people attempting to use the risers

Time has passed, I learned a lot and the gods that are creating llama.cpp and other such programs have made it all possible. I’m running Mixtral 8x7b Q8 at 5-6 token/sec on a 12 gpu rig (1060 6gb each). Its wonderful (for me).

from https://www.reddit.com/r/LocalLLaMA/comments/17ixil8/psa_about_mining_rigs/

also a lot of people commented their experiences in this more recent thread https://www.reddit.com/r/LocalLLaMA/comments/1bhfjd3/quick_experiment_how_is_inference_affected_with/

MazeFrame · April 1, 2024, 11:38pm

Especially when peripherals are going to be plenty, desktop-platforms have become VERY limiting.

timbit1123 · May 11, 2024, 5:49am

I also bought 4 Tesla P40 to be able to learn more on inference, training, LoRa Fine-tuning, etc.
The problem is I feel like price went up on servers, I was looking at SuperMicro 4028GR-TRT or Gigabyte G292-Z20. Cool thing is they can support up to 8 gpu but for now, I’m not sure if I want to invest that much more (also, i’m in canada so shipping is almost as much as the server…)

Any suggestion? I have an old asus X99-A. it has 4 pcie but its 16x/16x/8x/4x I don’t know what is the max bandwidth on those P40 but I’m sure 4x isn’t great

Thanks!

drbsg · May 14, 2024, 11:06am

I recently considered whether to use Tesla P40s for this purpose after lurking on /r/localllama for a while. There is one thing you have to be careful of - they require resizable BAR support from your CPU/Motherboard/BIOS.

So if you are using an X99 board (or the dual Xeon equivalent), you will need to check. It is my understanding that many do not support resizable BAR, but it is possible to modify the BIOS to enable it. My Asus Z10PE-D16 WS would have required such a modification, which led me down a different route.

timbit1123 · May 16, 2024, 12:48am

I might then wait to get more money to invest in a G292-Z20 since at least amd 7002 support pcie gen4 for future if required.

system · February 13, 2025, 6:49pm

This topic was automatically closed 273 days after the last reply. New replies are no longer allowed.