Rodka
RRunPod
•Created by Bernardo Henz on 1/22/2025 in #⚡|serverless
Guidance on Mitigating Cold Start Delays in Serverless Inference
But these still do not explain how I got more than 100s of delay time.
14 replies
RRunPod
•Created by Bernardo Henz on 1/22/2025 in #⚡|serverless
Guidance on Mitigating Cold Start Delays in Serverless Inference
Yeah. Sometimes it did on specific workers. I used faster-whisper to load them. And there's nothing failing.
14 replies
RRunPod
•Created by Bernardo Henz on 1/22/2025 in #⚡|serverless
Guidance on Mitigating Cold Start Delays in Serverless Inference
@nerdylive
Actually, we download the models only during the build. So they are not being downloaded again during cold starts. However, we still think the "normal" cold starts are too big, taking about 10s (loading the model themselves usually take about 2-5s).
Furthermore, we have no idea why in some rare cases it takes an absurd amount of time, like the >100s. This is our biggest problem.
14 replies
RRunPod
•Created by 1AndOnlyPika on 10/5/2024 in #⚡|serverless
Flashboot not working
No it did not timeout
56 replies
RRunPod
•Created by 1AndOnlyPika on 10/5/2024 in #⚡|serverless
Flashboot not working
I just rolled back to RunPod 1.6.2 (from 1.7.1, since I updated it yesterday) in my Docker image and it seems to have fixed. I'll run some more tests to confirm.
56 replies
RRunPod
•Created by 1AndOnlyPika on 10/5/2024 in #⚡|serverless
Flashboot not working
Very inconsistent, and these are all sequential requests to the same worker
56 replies
RRunPod
•Created by Rodka on 5/27/2024 in #⚡|serverless
How to schedule active workers?
@nerdylive Might be interesting to point out that the mutation I ended up using for this (saveEndpoint) is not on the GraphQL API reference; I had to dig through the runpod-python code to find it.
11 replies
RRunPod
•Created by Rodka on 5/27/2024 in #⚡|serverless
How to schedule active workers?
Gotcha. I'll look into it. There's no way to do this scheduling through the frontend?
11 replies
RRunPod
•Created by Rodka on 5/22/2024 in #⚡|serverless
Question on Flash Boot
Gotcha. Thanks.
4 replies
RRunPod
•Created by Rodka on 5/17/2024 in #⚡|serverless
Warming up workers
Gotcha. Many thanks!
3 replies
RRunPod
•Created by Rodka on 5/16/2024 in #⚡|serverless
Serverless GPU Pricing
OK. That's what I thought; just wanted to make sure.
Many thanks.
14 replies
RRunPod
•Created by Rodka on 5/9/2024 in #⚡|serverless
Running fine-tuned faster-whisper model
My model is already fine-tuned so I'd just need to load it for inference; from what I see I can just edit the worker above to use the fine-tuned model instead of a "default" model.
6 replies
RRunPod
•Created by Rodka on 5/9/2024 in #⚡|serverless
Running fine-tuned faster-whisper model
Interesting! Thank you.
6 replies