Teddy Posts - Answer Overflow

Teddy

•Created by Teddy on 3/19/2025 in #⛅｜pods

vLLM and multiple GPUs

Hi, I am trying to deploy a model (LLM) of 3B in Runpod with vLLM. I have tried different configurations (4xL4 or 2xL40, etc) but in all I get a CUDA memory error, as if both GPUs are not sharing memory. I have tried pipeline-parallel-size and tensor-parallel-size but I still get the same error.

15 replies

Gaming

Programming