Abdelrhman Nile Posts - Answer Overflow

Abdelrhman Nile

•Created by Abdelrhman Nile on 4/16/2025 in #⚡｜serverless

Serverless VLLM concurrency issue

Hello everyone, i deployed a serverless vllm (gemma 12b model) through runpod ui. withj 2 workers of A100 80GB vram. if i send two requests at the same time, they both become IN PROGRESS but i recieve the ouput stream of one first, the second always waits for the first to finish then i start recieveing the tokens stream. why is it behaving live this?

81 replies

Gaming

Programming