RunPod•11mo ago

Need Guidance about LLM Serverless Worker

Hi, I just got the VLLM to work. Its working great. Now comes the problem. PROBLEM: When I make a request to the api, instead of saying "QUEUED" it waits for the generation to finish and then gives me the response. This can take 10 seconds or more. Instead I wanted to modify the handler such that when I make the request, it instantly says "QUEUED" and I can start polling for when it is done. For image generation workers (e.g. ComfyUI worker) it does it that way. So I am not sure if it is just not implemented that way for vLLM or LLM's yet or it is implemented and I am doing something wrong. Thanks.

0 Replies

No replies yetBe the first to reply to this messageJoin

Gaming

Programming

Need Guidance about LLM Serverless Worker

Did you find this page helpful?