Keeping Flashboot active?
It is my understanding that Flashboot is only active for "a while" after each request, and then it is disabled as the instance goes to a deeper sleep. Sadly for me it takes a whopping 70-90 seconds of just delay to cold start after a long delay (running llama-2-13b-chat-hf off the 48GB GPUs e.g. A40), I don't know if I am doing something wrong there as I see others on this forum are getting much much faster start times. However, on consecutive jobs, the delay drops down to 1-3 seconds. What is the minimum time between requests to keep Flashboot functional? I assume this is some "secret", but would e.g. 1 job every 10 minutes do the trick?
6 Replies
The outcome depends on multiple factors, and there isn’t a fixed timeframe we can provide. It is based on requests you have and our platform available resources
Is there a minimum time before which flash boot is always guaranteed?
Or can it literally also be disabled 5 seconds after the last request?...
Because the main selling point of this service for us was precisely flash boot, so I was hoping to have more information on its reliability
Not really. It’s more like of probability. If you send requests right after worker stop there is a higher chance of it occurring, while waiting longer, the likelihood decreases.
I see, so it really isn't designed to be reliable?
It is pretty reliable but it was not designed so you can easily subvert it to get regular workers to act like active workers (which runs all the time). It was designed to help with scaling with endpoints that have traffic. The more traffic you have and the more max workers you have the more benefit you will get from Flashboot. If your endpoint has little or no traffic you would be better off adding an active server. I know that is not what you want to hear, as you likely want to pay less instead of more but if you do the research I think you will still find that nobody else will let you scale from 0 like RunPod does.
I see, thanks for the information! Of course the hope is to get enough traffic to make the active server worth it 🙂