VRAM limit and virtual memory or swap in PyTorch

Johnmakuta · May 8, 2024, 5:23pm

I have a 3090ti which I do lot of my machine learning work on. While I have two 3090ti’s with an NVlink bridge between them, I have yet to utilize both.

My issue I’ve been running into is I have this neural network profiling tool I’ve been that currently utilizes one 3090ti and hits a limit towards the very end of the run when I overflow the 24GBs of vram. This only occurs on my Linux box. On my windows laptop that has a 3070 with 8GBs of vram on Windows 11. On windows 11 in task manager it lists virtual memory for the GPU and I’ve noticed when I hit the VRAM buffer limit it starts using it. It gets horribly slow but, it still finishes the task.

How do I get this functionality on Ubuntu so that when I hit the vram limit, I don’t crash my programs and an unable to finish the data collection?

Even if I can’t get that working, isn’t there a way to use my second 3090ti as just memory through NVlink? I remember reading somewhere it was possible to use two 3090tis as a single 3090ti with 48GBs of vram?