Has anyone liquid cooled an A100 yet?

I don’t think their are any easily obtainable water blocks for them, at least to the public in single number volumes.

The die on it is unlike any of the mainline stuff as far as I know so there is no common block for it. If they do them at all it will be corporate surplus.

Edit: if this picture is correct I don’t think there are any water blocks for it unless there is special ones for data centre.

Pretty sure this is not the A100 but the DPU from mellanox

Why would a Mellanox card need the same die as the DGX A100 with 6 stacks of HMB for 40GB of memory? and NVLink connectors?

I was suspect too because it has Infiniband connectors and none of the renders of the EGX A100 have them but this is the picture that is everywhere of the bare PCB, there does not seem to be other shots readily available.

They look similar but not the same

Sorry, my fault

Nah no problems. I was not sure myself.

I had a couple messages back and forth with ekwb and it doesn’t look like I’ll be seeing a commercially available water block from them any time soon. I’ve tried a couple of things and finally have something working.

The first thing I tried was to seal off the case and turn all my fans into intake fans thinking that if the only place the air could escape through was the A100 it might be enough. Even purchased an anemometer to measure the CFM through the GPU. This works to some level except that the I need all my intake fans running at full speed to get the needed airflow through the GPU.

I began experimenting with a couple fan shroud designs with some cardboard and gaffers tape. Once I was happy with the layout and airflow I was able to transfer the measurements to some aluminum sheet scraps I had in my shop and welded it up.

It appears that it works best if I pull air through the shroud as it is hardly engineered for optimal airflow. One of the problems I had with running the fan as intake (intake from GPUs standpoint not the case) is that because the fan had enough power, and was basically blowing a flat plate, a good portion of the air was actually getting directed back out. It had me scratching my head as to why my intake fan was exhausting air.

But in any case here is my temporary solution until I can find or make a waterblock.


4 Likes

Pretty sure @wendell used external(?) blower fans for his V100, perhaps he could link the exact model on here.

1 Like

I tried attaching on of the PCI slot exhaust fans as well but it wasn’t moving enough air. This is the one I tried: evercool fox-2 image

Flow wise it would probably be better to have the fan pointing into the end of the shroud and have it reduce down to the size of the cards opening than having then air try to make a tight 90° turn. That way the air can keep going in a more or less straight line and potentially have less speed or pressure loss than being forced to make a turn and having to get going again in a different direction.

1 Like

That was another cardboard prototype but it required a lot more aluminum and I had a couple of scrap pieces that were not big enough for that design. It also included a lot more small parts and angles. I completely agree that if it was more efficient I could probably run a slower fan. Maybe rev b. when I find some more aluminum. But to be honest this is keeping the GPU pretty cool. I can run some benchmarks and get temp readings. It just sounds like a drone.

If anyone comes across a big block of copper or nickel I’ll gladly machine a water block prototype and test it.

3 Likes

I ran some benchmarks and turns out its working quite well. I ran the SSD V1.2 TensorFlow training example container just to stress test the system.

I ran on both my “custom” cooled A100 and the new RTX A6000. I’m assuming its running well as the A100 appears to hover around 77 degrees Celsius once warmed up and running for a couple hours whereas the RTX A6000 was at 84 degrees Celsius and only running its internal fan at ~60%. I don’t know the thermal profile of the fan but I would assume that if its only running at 60% then 84 degrees is fine. I know the A100 has a thermal throttle at 92 degrees so I think I can safely call it a success. In fact I might change the profile of the A100’s fan to run a little slower and lower db.

The container can be found on nvidia ngc website under nvidia:ssd_for_tensorflow

and run using:

bash examples/SSD320_FP16_1GPU_BENCHMARK.sh

Training ResNet 50 on the CoCo dataset times on a single GPU were:
A100 : real = 276m21.864s
RTX A6000 : real = 408m54.403s

4 Likes

Damn, A100 really on another level.

Definitely on another level. I want to run some more benchmarks if I can get access to some more GPUs. I’m sure there is another thread for benchmarking etc. So transition there for further developments.

Nothing immediate seen [still is a bit new of a device outright, to be looking at LC]. Maybe try wedging a more aggressive rpm fan on your existing shroud, or arranging for having fans directly facing the intake point [granted clearances applicable]

See this measly Tesla, for reference

Yeah those are junk. This is what you want:
https://www.digikey.com/en/products/detail/delta-electronics/BFB1012HH/2560501

Or the more recent, compact version:
https://www.digikey.com/en/products/detail/delta-electronics/BCB0812UHN-TP09/2034820

This kept both GPUs on the V7350x2 (dual Polaris, ~250W TBP) under 73C during hour long workloads.

I’m in the process of working with EKWB to get an A100 waterblock. We’re still in the early stages (arranging for a pickup soon), and I’m not sure what the turnaround time is to getting a waterblock designed and manufactured. But I’m hoping that they’ll have something available in the coming months.

In the meantime, I’ve been using fans from a supermicro server to cool a dual-A100 workstation, and I’ve been able to get pretty good cooling results without a ton of noise. I was planning to do a full write-up/build video at some point, but here’s some quick photos.

I’m using two 92mm fans in a push/pull config. The fans are Nidec UltraFlow (model# v92e12bga7-57), which corresponds to the supermicro part number FAN-0115L4, listed as $20 each. The server I took them from had them in a plastic enclosure with fan guards (similar to FAN-0114L4), which I kept on.

The A100s are installed in a Phanteks P600s. The intake fan is held up with lego pieces and secured with duct tape:

The exterior fan is installed outside the case, secured with an L bracket on the top and held up from below again with lego pieces. I also snaked a temperature probe from the motherboard and sandwiched it between the A100 exhaust ports and the fan (not shown). Duct tape again keeps this in place:

The final configuration isn’t pretty, but it’s actually fairly compact compared to other solutions that use blower fans that extend the length of the GPUs.

Obviously at 100% fan speed the noise is too loud to use as a workstation. But using the temps from the probe, I set a fan speed curve in BIOS that maxes out the fans at 35% speed (around 2600 RPM). This keeps both GPUs at around 75C under full load (tested with Tensorflow), with a noise level of around 48 dB at 1 foot.

It’s definitely quiet enough at 35% RPM to use as a workstation. I’m working from home and have it set up in my main living space. The noise is noticeable, but the the sound is pleasing at this speed (mostly just the sound of air moving; there is no annoying high-pitched character to it), and it’s not bothersome to my spouse. I would still like to transition to a watercooling setup when available to get a truly silent workstation, but I’m pleased enough with how it’s working for now.

I had some additional photos but apparently I can’t add more than 2.

5 Likes

Hi.
If you are interested in Liquid cooling servers / Workstations with dual or Quad A100 or V100. I can help you.
I do represent Comino Company. we have been sent many machines with dual and quad A100 GPUs to many customers. I’m sharing our water blocks custom-made with A100.

4 Likes

3 Likes

Dear Hassan.Anwar

Your A100 water block looks phenomenally beautiful!
I am desperately searching for a water block for the RTX A6000. Any chances you have access to such one?

Greetings!

1 Like