bggai
RRunPod
•Created by bggai on 7/27/2024 in #⚡|serverless
Help Reducing Cold Start
Hi, I've been working with RunPod for a couple of months, it has been great.
I know the image only downloads one time, I know you have two options for optimization, embedding the model on the docker image or having a network volume but with less flexibility since it will be located only on one region. I'm embedding my model on the docker image plus executing scripts to cache the loading, config or downloading.
I'm using whisper-large-v3 model with my own code since it has a lot of optimizations. The cold start without any flashboot is between 15-45 seconds. My goal is to reduce this time as much as possible without depending on a high requests volume.
In this case, would a network volume with a specific available GPU reduce the cold start? I'm having trouble understanding if a network volume would do the trick. Is the cold start loading my container only or also loading that model, would a storage fix this?
13 replies
RRunPod
•Created by bggai on 2/4/2024 in #⚡|serverless
Worker hangs for really long time, performance is not close to what it should be
84 replies