Wendell has a craaaazy A-series GPU. Guess what? It has no fan, and gets hot! So he tasked me with designing a model that will attach a blower-style fan to the end of the GPU.
He wanted it to fit like a glove onto the fan, have an angled/low part beside where the power connector is so you can get your fingers in there, cover all of the end except the power connector, and give room for the airflow.
Model
After dealing with an annoying amount of supports, I made the final version make use of angles! This model can print without needing supports. Yay!
I made use of the screw holes at the end of the card to firmly attach the mount.
Nice job. This is exactly where 3D printing is extremely useful. That is pretty interesting how you made use of angles to avoid supports. Is this a commonly used technique? (I don’t have a finished 3D printer yet but I am trying to soak up as much theory as possible).
What kind of filament did you use? If PLA, I guess it should not get soft since the fan is blowing cold air across the surface. Anyway I am always interested what material people use for hardware mods since case interiors can get pretty hot.
Please keep up posted on how it holds up after some time.
Thanks.
@Level1_Amber, thank you so much for sharing this! I am wondering if you can share the blower fan model as well? I would love to add this blower to my A100 setup.
Essentially i am wondering if there is a way you guys could look into 3d printing a brace bracket for third party fans for The Freezer 4U-M CPU Cooler for threadripper Tr5s cpus.
Let me ask all you A100 enthusiasts in here: why? It’s an outdated card that still goes for many thousands on ebay (over $10k for the 80GB version) and I just don’t understand. It’s probably the sexiest GPU Nvidia ever made in terms of looks, but other than that, what’s the attraction? Surely you’d get a lot more bang for your buck with current generation cards, especially if you were to spend the same.
There are a few things that can make the A100 and similar appropriate for some workloads:
memory: if you are working with large data sets, or some of the new deep learning/AI models such as the large language models like chatgpt, you need a lot of memory. Data and models can be split and run in parallel on multiple GPUs, but getting as much as possible in memory can make a big difference to performance, or even being able to run a large model at all.
tensor cores: for deep learning/AI, tensor cores can be more important than cuda cores, and cards like the A100 can have a lot more tensor cores
many of these cards can be run together for really large data sets or AI models, think like 6-8 per server, and many many servers, see below about nvlink and cooling. See nvidia DGX systems for some beautiful engineering with many GPUs.
nvlink: these sort of cards support nvlink at the server and server rack level, rather than SLI type connection, so many cards can communicate quickly
power efficiency: dependent on which model of A100 (40 vs 80GB, pcie vs sxm) the power draw is 250 - 400W. A 4090 is probably slower for a lot of the workloads I describe, and consumes more power, maybe 450W. Assuming that workload fits in 24GB memory, if you need 80GB memory thats 4 x 4090 at 4 x 450W power! Power in a data centre is expensive, and even 50W across many GPUs running 24/7/365 adds up to a lot of money.
reliability: these cards are designed to run 24/7/365, for years
cooling: in a rack server, the server provides high air flow to keep the card cool. This allows many of the cards to be installed in a single server. Project here is a bit different. Many DC GPUs are starting to switch to water cooling as it is more compact and better at cooling ie cheaper, particularly when the cooling power saving is for many GPUs.
It’s worth experimenting with the number of GPUs. You can see that performance usually scales near linearly with number GPUs. That’s why it is popular for deep learning/AI workstations, particularly for researchers, and personal rigs, to run 4 3090Ti, as you need the performance and memory of cards like the A100, but not necessarily the other advantages. And a multi 3090Ti setup is a lot cheaper.
I was more getting at the hobbyist scenario, rather than filling a server or a rack with these things, but thanks for the insightful answer & much appreciated.
I guess if it’s performance/$ or ease of use the 4090 wins, but if you’re looking at performance/watts or ease of scaling, it’s not even on the radar.
installed the shroud and the blower fan, made a huge difference in terms of A100 temperature - If I just use the 2x 120mm case fans of my rack server (setting to the highest fan speed), the A100 has an idle temperature of 64C, and rising to 80+C when running LLM workloads;
using this setup, I was able to get 36C idle temperature and ~40-50C workload temperature. thanks a lot for sharing this!
the only drawback was that the length of the card became long, and I had make some space by shifting the position of a case fan. A shroud design like this one may be more space-friendly