Rodka
RRunPod
•Created by Bernardo Henz on 1/22/2025 in #⚡|serverless
Guidance on Mitigating Cold Start Delays in Serverless Inference
And I see this all the time: different workers downloading the image, even in the same endpoint. I thought it was standard.
40 replies
RRunPod
•Created by Bernardo Henz on 1/22/2025 in #⚡|serverless
Guidance on Mitigating Cold Start Delays in Serverless Inference
I had enabled the network volume before, thinking it could be a solution. Then I disabled it and terminated all the workers to get new ones on the "latest version" of the endpoint. Some workers already had the cached docker image (probably because I used them before), but the ones that didn't needed to download it.
40 replies
RRunPod
•Created by Bernardo Henz on 1/22/2025 in #⚡|serverless
Guidance on Mitigating Cold Start Delays in Serverless Inference
I mean, now it's okay since it's been some time since everything downloaded.
40 replies
RRunPod
•Created by Bernardo Henz on 1/22/2025 in #⚡|serverless
Guidance on Mitigating Cold Start Delays in Serverless Inference
I did refresh the page but with F5.
40 replies
RRunPod
•Created by Bernardo Henz on 1/22/2025 in #⚡|serverless
Guidance on Mitigating Cold Start Delays in Serverless Inference
I'm using the same endpoint, just terminated the other workers as a test.
40 replies
RRunPod
•Created by Bernardo Henz on 1/22/2025 in #⚡|serverless
Guidance on Mitigating Cold Start Delays in Serverless Inference
I don't believe this worker should be considered idle
40 replies
RRunPod
•Created by Bernardo Henz on 1/22/2025 in #⚡|serverless
Guidance on Mitigating Cold Start Delays in Serverless Inference

40 replies
RRunPod
•Created by Bernardo Henz on 1/22/2025 in #⚡|serverless
Guidance on Mitigating Cold Start Delays in Serverless Inference
Oh yeah, did it already.
40 replies
RRunPod
•Created by Bernardo Henz on 1/22/2025 in #⚡|serverless
Guidance on Mitigating Cold Start Delays in Serverless Inference
Thanks thanks
40 replies
RRunPod
•Created by Bernardo Henz on 1/22/2025 in #⚡|serverless
Guidance on Mitigating Cold Start Delays in Serverless Inference
I passed both info to support yesterday:
request id:
sync-2fbf700d-b754-44d2-8df2-9ac9fb536005-u1
worker id: l8q3x9g7a1prqj
While I'm not 100% sure that this happened (since I did not annotate the exact worker id), I noticed in the log that the worker that had a "running" status was downloading the docker image.
But after the worker executed the request, the previous log disappeared.40 replies
RRunPod
•Created by Bernardo Henz on 1/22/2025 in #⚡|serverless
Guidance on Mitigating Cold Start Delays in Serverless Inference
@nerdylive I noticed that sometimes a worker takes too much time to completely setup a docker image, and sometimes the worker that is "downloading" the docker image is set as "idle" instead of "initializing". I think this is a bug. What can happen in this case is that a request may be allocated to this bugged worker, and I believe this is why the delay time may be huge sometimes.
Would using a Network Volume solve this problem? Note: I already download the models when building the docker image, so they're already cached. The problem is when a new worker is started and it needs to build the docker image. My image has 8 GiB total, so it's not that big. But the download parts take too much time through RunPod.
Or is the Network Volume completely unrelated in this case?
40 replies
RRunPod
•Created by Bernardo Henz on 1/22/2025 in #⚡|serverless
Guidance on Mitigating Cold Start Delays in Serverless Inference
But these still do not explain how I got more than 100s of delay time.
40 replies
RRunPod
•Created by Bernardo Henz on 1/22/2025 in #⚡|serverless
Guidance on Mitigating Cold Start Delays in Serverless Inference
Yeah. Sometimes it did on specific workers. I used faster-whisper to load them. And there's nothing failing.
40 replies
RRunPod
•Created by Bernardo Henz on 1/22/2025 in #⚡|serverless
Guidance on Mitigating Cold Start Delays in Serverless Inference
@nerdylive
Actually, we download the models only during the build. So they are not being downloaded again during cold starts. However, we still think the "normal" cold starts are too big, taking about 10s (loading the model themselves usually take about 2-5s).
Furthermore, we have no idea why in some rare cases it takes an absurd amount of time, like the >100s. This is our biggest problem.
40 replies
RRunPod
•Created by 1AndOnlyPika on 10/5/2024 in #⚡|serverless
Flashboot not working
No it did not timeout
56 replies
RRunPod
•Created by 1AndOnlyPika on 10/5/2024 in #⚡|serverless
Flashboot not working
I just rolled back to RunPod 1.6.2 (from 1.7.1, since I updated it yesterday) in my Docker image and it seems to have fixed. I'll run some more tests to confirm.
56 replies
RRunPod
•Created by 1AndOnlyPika on 10/5/2024 in #⚡|serverless
Flashboot not working
Very inconsistent, and these are all sequential requests to the same worker
56 replies