Anders
RRunPod
•Created by Anders on 3/15/2025 in #⚡|serverless
Anyone get vLLM working with reasonable response times?
Seems that no matter how I configure serverless with vLLM, the workers are very awkward in picking up tasks and even with warm containers tasks in the queue sit around for minutes for no obvious reason. Has anyone actually been able to use serverless vLLM for a production use case?
6 replies