R750xa, dual H100 80 GB with NVlink 600 GB/s: $61,669.80
R750xa, dual A100 80 GB with NVlink 600 GB/s: $39,622.13
I avoided the jump from the A40s to the A100s because I found it made marginal (few %) speedups in actual calculations. If I’m interpreting the data correctly, it seems like the leap from GDDR6X to HBM2 is more useful if you can load your entire dataset into VRAM to begin with. In my case, datasets are dozens of terabytes, so that’s not going to be an option for a long time. So the only speedup I see in my workflow is in speed of actual matrix operations.
With the H100, whole new architecture, so I’m guessing there’s substantial speedups in addition to the HBM memory. Problem is, there’s very little detail available, and what I can find is mostly for AI workloads where, again, presumably the whole dataset is already in memory.
Obviously the price jump is huge, but the H100s seem like they might actually be better value for the money if Nvidia’s 2.7x speedup is to be believed. Anyone here have any experience with the H100s that might be willing to chime in?
Is there something about the workload that stops you from offloading data chunks to fast NVMe drives before processing? Deepspeed [0] has some options if the workload is not latency dependant, since Large ML models can be 320GB now days.
Tensor Memory Accelerator (TMA) is a new addition for memory fetch on the H100, tensor cores will be the same but it’s able to pre-fetch from memory avoiding idel cycles and gaining 15% [1]
On the original topic I don’t have hands on experience, but you could get a demo or pay a few bucks to benchmark them. Not a recommendation tho I see this guy’s CoreWeave have H100’s coming!
Not got any to hand atm, but we do have a few ready to install in our lab, Our systems are based on EPYC Rome (Linux) / Xeon Scalable 3rd Gen( Windows), i can give you some benchmarks once i get around to setting them up if you are curious?