Need Guidance about LLM Serverless Worker
Hi,
I just got the VLLM to work. Its working great. Now comes the problem.
PROBLEM: When I make a request to the api, instead of saying "QUEUED" it waits for the generation to finish and then gives me the response. This can take 10 seconds or more.
Instead I wanted to modify the handler such that when I make the request, it instantly says "QUEUED" and I can start polling for when it is done.
For image generation workers (e.g. ComfyUI worker) it does it that way. So I am not sure if it is just not implemented that way for vLLM or LLM's yet or it is implemented and I am doing something wrong.
Thanks.
0 Replies