worker-vllm: Always stops after 60 seconds of streaming

Serverless is giving me this weird issue where the OpenAI stream stops after 60 seconds, but the request keeps running in the vLLM worker deployed. This results in not getting all the outputs, wasting the compute resources. The reason I want it going longer than 60 seconds is that I have a use-case for generating very long outputs. I have needed to resort to directly querying api.runpod.ai/v2. This has benefits of being able to get the job_id and do more things, but I would like to do this with the OpenAI API.
1 Reply
Poddy
Poddy2d ago
@Casper.
Escalated To Zendesk
The thread has been escalated to Zendesk!
Ticket ID: #12,032

Did you find this page helpful?