I am trying to run Qwen3 Omni and Qwen3 Next on my 5090 and 256GB RAM. I have tried VLLM but I cant seem to be able to get CPU Offload to work, I have also tried “sglang” and “hugging face transformers”.
Does anyone know any good ways to run Qwen3 Next with CPU offload (or CPU only) ?
Thanks!