R
RunPod6mo ago
Casper.

Delay on startup: How long for low usage?

I am trying to gauge the actual cold start for a 7B LLM deployed with vLLM. My ideal configuration is something like this: 0 active workers, 5 requests/hour, and up to between 100-200 seconds of generation time. How long would it take for RunPod to do a cold start with delay time and everything? Essentially, what is the min, avg, max in terms of time to first token generated?
0 Replies
No replies yetBe the first to reply to this messageJoin