RunPod•4mo ago

worker-vllm: Always stops after 60 seconds of streaming

Serverless is giving me this weird issue where the OpenAI stream stops after 60 seconds, but the request keeps running in the vLLM worker deployed. This results in not getting all the outputs, wasting the compute resources. The reason I want it going longer than 60 seconds is that I have a use-case for generating very long outputs. I have needed to resort to directly querying api.runpod.ai/v2. This has benefits of being able to get the job_id and do more things, but I would like to do this with the OpenAI API.

1 Reply

Poddy•4mo ago

@Casper.

Escalated To Zendesk

The thread has been escalated to Zendesk!

Ticket ID: #12,032

Gaming

Programming

worker-vllm: Always stops after 60 seconds of streaming

Did you find this page helpful?