Rodka Comments - Answer Overflow

Rodka

•Created by Bernardo Henz on 1/22/2025 in #⚡｜serverless

Guidance on Mitigating Cold Start Delays in Serverless Inference

And I see this all the time: different workers downloading the image, even in the same endpoint. I thought it was standard.

40 replies

RRunPod

•Created by Bernardo Henz on 1/22/2025 in #⚡｜serverless

Guidance on Mitigating Cold Start Delays in Serverless Inference

I had enabled the network volume before, thinking it could be a solution. Then I disabled it and terminated all the workers to get new ones on the "latest version" of the endpoint. Some workers already had the cached docker image (probably because I used them before), but the ones that didn't needed to download it.

40 replies

RRunPod

•Created by Bernardo Henz on 1/22/2025 in #⚡｜serverless

Guidance on Mitigating Cold Start Delays in Serverless Inference

I mean, now it's okay since it's been some time since everything downloaded.

40 replies

RRunPod

•Created by Bernardo Henz on 1/22/2025 in #⚡｜serverless

Guidance on Mitigating Cold Start Delays in Serverless Inference

I did refresh the page but with F5.

40 replies

RRunPod

•Created by Bernardo Henz on 1/22/2025 in #⚡｜serverless

Guidance on Mitigating Cold Start Delays in Serverless Inference

I'm using the same endpoint, just terminated the other workers as a test.

40 replies

RRunPod

•Created by Bernardo Henz on 1/22/2025 in #⚡｜serverless

Guidance on Mitigating Cold Start Delays in Serverless Inference

I don't believe this worker should be considered idle

40 replies

RRunPod

•Created by Bernardo Henz on 1/22/2025 in #⚡｜serverless

Guidance on Mitigating Cold Start Delays in Serverless Inference

40 replies

RRunPod

•Created by Bernardo Henz on 1/22/2025 in #⚡｜serverless

Guidance on Mitigating Cold Start Delays in Serverless Inference

Oh yeah, did it already.

40 replies

RRunPod

•Created by Bernardo Henz on 1/22/2025 in #⚡｜serverless

Guidance on Mitigating Cold Start Delays in Serverless Inference

Thanks thanks

40 replies

RRunPod

•Created by Bernardo Henz on 1/22/2025 in #⚡｜serverless

Guidance on Mitigating Cold Start Delays in Serverless Inference

I passed both info to support yesterday: request id: sync-2fbf700d-b754-44d2-8df2-9ac9fb536005-u1 worker id: l8q3x9g7a1prqj While I'm not 100% sure that this happened (since I did not annotate the exact worker id), I noticed in the log that the worker that had a "running" status was downloading the docker image. But after the worker executed the request, the previous log disappeared.

40 replies

RRunPod

•Created by Bernardo Henz on 1/22/2025 in #⚡｜serverless

Guidance on Mitigating Cold Start Delays in Serverless Inference

@nerdylive I noticed that sometimes a worker takes too much time to completely setup a docker image, and sometimes the worker that is "downloading" the docker image is set as "idle" instead of "initializing". I think this is a bug. What can happen in this case is that a request may be allocated to this bugged worker, and I believe this is why the delay time may be huge sometimes. Would using a Network Volume solve this problem? Note: I already download the models when building the docker image, so they're already cached. The problem is when a new worker is started and it needs to build the docker image. My image has 8 GiB total, so it's not that big. But the download parts take too much time through RunPod. Or is the Network Volume completely unrelated in this case?

40 replies

RRunPod

•Created by Bernardo Henz on 1/22/2025 in #⚡｜serverless

Guidance on Mitigating Cold Start Delays in Serverless Inference

But these still do not explain how I got more than 100s of delay time.

40 replies

RRunPod

•Created by Bernardo Henz on 1/22/2025 in #⚡｜serverless

Guidance on Mitigating Cold Start Delays in Serverless Inference

Yeah. Sometimes it did on specific workers. I used faster-whisper to load them. And there's nothing failing.

40 replies

RRunPod

•Created by Bernardo Henz on 1/22/2025 in #⚡｜serverless

Guidance on Mitigating Cold Start Delays in Serverless Inference

@nerdylive Actually, we download the models only during the build. So they are not being downloaded again during cold starts. However, we still think the "normal" cold starts are too big, taking about 10s (loading the model themselves usually take about 2-5s). Furthermore, we have no idea why in some rare cases it takes an absurd amount of time, like the >100s. This is our biggest problem.

40 replies

RRunPod

•Created by 1AndOnlyPika on 10/5/2024 in #⚡｜serverless

Flashboot not working

No it did not timeout

56 replies

RRunPod

•Created by 1AndOnlyPika on 10/5/2024 in #⚡｜serverless

Flashboot not working

It did.

56 replies

RRunPod

•Created by 1AndOnlyPika on 10/5/2024 in #⚡｜serverless

Flashboot not working

I just rolled back to RunPod 1.6.2 (from 1.7.1, since I updated it yesterday) in my Docker image and it seems to have fixed. I'll run some more tests to confirm.

56 replies

RRunPod

•Created by 1AndOnlyPika on 10/5/2024 in #⚡｜serverless

Flashboot not working

Very inconsistent, and these are all sequential requests to the same worker

56 replies

RRunPod

•Created by 1AndOnlyPika on 10/5/2024 in #⚡｜serverless

Flashboot not working

56 replies

RRunPod

•Created by Rodka on 5/27/2024 in #⚡｜serverless

How to schedule active workers?

Also: https://doc.runpod.io/recipes/modify-an-existing-serverless-endpoint

11 replies

Gaming

Programming