7 Replies
Could be various different reasons
* Not enough workers to handle the number of concurrent requests, so requests sit in the queue
* Cold start time (more common)
I'm guessing I can't control the cold start time?
I don't think workers are an issue
You can do things like enabling flash boot, increasing idle timeout, adding active workers etc to improve cold start times.
FlashBoot is the only one thats free though.
How does flashboot work?
Endpoint configurations | RunPod Documentation
Configure your Endpoint settings to optimize performance and cost, including GPU selection, worker count, idle timeout, and advanced options like data centers, network volumes, and scaling strategies.
Basically the tldr though from asking them is it's a caching mechanism, so the more max workers u have, the more requests, the better the cache
if u have an active worker, supposedly is even faster, but i dont think is necessary, cause ive heard from people using it in prod that the flashboot is still quite fast normally even with a min worker of 0
Yeah I don't have min/active workers and flashboot works well for me pretty often but it doesn't work so well for me when I don't have a constant flow of requests.