Building a budget AI - Machine learning system

I am building a budget server to run AI and I have no experience running AI software. I’m thinking starting with Llama LLM, but would like to get into making AI pictures and videos as well plus who knows what else once I learn more about this. I am just getting into this and have not received the hardware yet but it is ordered. I’m just gathering information so I know how to get started when it gets here.

System specs:

Dual E5 2686 V4 (32 cores, 72 threads total)

128GB ECC RAM

2TB Gen 4 NVME SSD

(4) 1TB SATA SSDs in RAID 0

(4) Tesla P40 24Gb cards (uses the GP102 chip, same as the Titan XP and 1080TI)

I’m planning to run this headless and remote into it. This is just for tinkering at home and I’m not worried if it isn’t the fastest system in the world.

What would be the best OS?

What drivers are the best to use with the Tesla P40 cards?

Any other thoughts on this setup, or suggestions?

Do I need to use NV link on the cards in order to use all the VRAM?

I am thinking of using bifurcation and running each card on 8 PCIE gen 3 lanes, Do you think that would cause a bottleneck?

What drivers are the best to use with the Tesla P40 cards?

To get nVidia’s official GRID drivers you have to sign up for their Enterprise account, and they don’t allow personal email addresses. I attempted with two separate “my own hosted” domain name emails and they still denied me. But if you go to the Discord server found on this post and then check the pins, you can find where to get the drivers.

To elaborate a bit more on drivers (assuming Linux), you’re either using the nouveau open source drivers or the official nVidia ones. But if you want to use vGPU you have to use the nVidia drivers, which technically requires a license. But you can follow the process in this link to get drivers that work (at least for Proxmox/Linux).

1 Like

I hope you are aware your Tesla cards are each about as powerful as a 3060, only redeeming feature is the 24 GB of VRAM as opposed to 12 for the 3060. :slightly_smiling_face:

Your CPUs are almost as outdated with about the same operative score as an Intel Core 14400.

Not that it matters much if you already got the hardware, but I would invest in a 14600k + RTX 4070 combo. That should be slightly more powerful for $1200-$1500 on brand new supported hardware.

1 Like

40 PCIe lanes on the E5’s though, vs 20 on the i5 :wink:

Mind you Nvidia aggressively limits FP16 and FP64 on their home-gamer products.

Depends. Proxmox or XCP-ng if you want to make this box do “all the things”, Ubuntu or Debian of you want to serve just as an “AI-workstation”.

Consider BTRFS or ZFS instead of Raid.

P40 has terrible FP16, a lot of people choose P100 over it even with the lower VRAM just for better FP16. The other thing is much older CUDA version and thus no support for nice things like Flash-Attention. It’s still a capable card, but it is definitely showing its age.

2 Likes

FP64 is basically non-existent in any non-x100 chip, be it tesla, geforce or whatever. Last generation that had meaningful FP64-capable hardware in all chips was Kepler.
Not like it matters, FP64 is pretty much irrelevant for ML (which is the point of interest for the OP).

About FP16, even if they limit it, it’s still many times faster than pascal since it has no support at all for it (or rather, has support at 1/64th of the fp32 speed). So a 3060 is almost ~70x faster than a P40 in FP16, while also being able to make use of the smaller data size to reduce vram usage.

1 Like

Wow, okay there N…

Anyone can confirm this guys success?https://www.reddit.com/r/LocalLLaMA/comments/13n8bqh/my_results_using_a_tesla_p40/

If you are concerned about FP64 buy AMD - not only do they have the performance crown on that, and has had it for years, they support it on all cards. However, that does not matter much since FP64 has no real world applications yet, pretty much.

As always, it is a matter of costs though. This system would probably be around 25-30% stronger than the system outlined above, and do it at a more comfortable power level:

If you remove the 2TB OS drive and the mirrored 4TB SSDs, you could realistically go below $1.5k. Does the OP system beat this? No idea, but there is a reference point to what an equivalent modern system would cost you.

And that is important, to establish references right? Otherwise, of course a potato system is sufficient to every use case under the sun, yessire! :stuck_out_tongue:

I don’t think looking at raw compute is a reasonable reference. Xeon platform makes it a lot more reasonable to run 4 GPU like the OP is doing (I said reasonable, because I know that some people have considered using risers traditionally used for crypto on ML/AI workloads). The other thing is a single 4070 is only 12 GB of VRAM, people still buy the P40 because it is a cheap way to access 24 GB of VRAM (I have also personally considered both the P40 and P100, but chose to hold off for now). You might also be surprised how cheap you can get some of the V3/V4 Xeon systems, I know I was which is why I bought 2 of them.

Edit: Wanted to add some references to people attempting to use the risers

Time has passed, I learned a lot and the gods that are creating llama.cpp and other such programs have made it all possible. I’m running Mixtral 8x7b Q8 at 5-6 token/sec on a 12 gpu rig (1060 6gb each). Its wonderful (for me).

from https://www.reddit.com/r/LocalLLaMA/comments/17ixil8/psa_about_mining_rigs/

also a lot of people commented their experiences in this more recent thread https://www.reddit.com/r/LocalLLaMA/comments/1bhfjd3/quick_experiment_how_is_inference_affected_with/

3 Likes

Especially when peripherals are going to be plenty, desktop-platforms have become VERY limiting.