Teddy
vLLM and multiple GPUs
Hi, I am trying to deploy a model (LLM) of 3B in Runpod with vLLM. I have tried different configurations (4xL4 or 2xL40, etc) but in all I get a CUDA memory error, as if both GPUs are not sharing memory. I have tried pipeline-parallel-size and tensor-parallel-size but I still get the same error.
15 replies