Abdelrhman Nile
RRunPod
•Created by Abdelrhman Nile on 4/16/2025 in #⚡|serverless
Serverless VLLM concurrency issue
Hello everyone, i deployed a serverless vllm (gemma 12b model) through runpod ui. withj 2 workers of A100 80GB vram.
if i send two requests at the same time, they both become IN PROGRESS but i recieve the ouput stream of one first, the second always waits for the first to finish then i start recieveing the tokens stream. why is it behaving live this?
81 replies