RunPod•16mo ago

Delay on startup: How long for low usage?

I am trying to gauge the actual cold start for a 7B LLM deployed with vLLM. My ideal configuration is something like this: 0 active workers, 5 requests/hour, and up to between 100-200 seconds of generation time. How long would it take for RunPod to do a cold start with delay time and everything? Essentially, what is the min, avg, max in terms of time to first token generated?

0 Replies

No replies yetBe the first to reply to this messageJoin

Gaming

Programming

Delay on startup: How long for low usage?

Did you find this page helpful?