RunPod•3w ago

vLLM Inconsistently Hangs at NCCL Initialization

Hi, I am trying to run vLLM on 2x A40s GPUs and it will sometimes hang at NCCL initialization. This inconsistently occurs and sometimes will work fine. But for a pod that it hangs on, repeated attempts will aways hang... CUDA 12.4.1 python 3.10 vllm 0.7.3 command: vllm serve unsloth/Meta-Llama-3.1-8B --tensor-parallel-size 2

2 Replies

Jason•3w ago

did you also filter when you create your pod to cuda 12.4++ only?

mapleOP•3w ago

we weren't and I think forcing 12.4 fixed the issue. Thanks!

Gaming

Programming

vLLM Inconsistently Hangs at NCCL Initialization

Did you find this page helpful?