worker-vllm: Always stops after 60 seconds of streaming
Serverless is giving me this weird issue where the OpenAI stream stops after 60 seconds, but the request keeps running in the vLLM worker deployed. This results in not getting all the outputs, wasting the compute resources.
The reason I want it going longer than 60 seconds is that I have a use-case for generating very long outputs. I have needed to resort to directly querying
api.runpod.ai/v2
. This has benefits of being able to get the job_id
and do more things, but I would like to do this with the OpenAI API.1 Reply
@Casper.
Escalated To Zendesk
The thread has been escalated to Zendesk!
Ticket ID: #12,032