Help Reducing Cold Start
Hi, I've been working with RunPod for a couple of months, it has been great.
I know the image only downloads one time, I know you have two options for optimization, embedding the model on the docker image or having a network volume but with less flexibility since it will be located only on one region. I'm embedding my model on the docker image plus executing scripts to cache the loading, config or downloading.
I'm using whisper-large-v3 model with my own code since it has a lot of optimizations. The cold start without any flashboot is between 15-45 seconds. My goal is to reduce this time as much as possible without depending on a high requests volume.
In this case, would a network volume with a specific available GPU reduce the cold start? I'm having trouble understanding if a network volume would do the trick. Is the cold start loading my container only or also loading that model, would a storage fix this?
10 Replies
I believe you've already implemented the optimal solution by embedding the model within the Docker image. This ensures fast access since the image is stored locally on the host machine. Using a network volume would likely slow things down due to the additional network data transfer. One option to consider is setting up an active worker. This setup could reduce cold start times, and since it's 40% cheaper when idle, it is a cost-effective solution. Additionally, for applications that are CPU-intensive at startup, using a CPU with a higher clock speed might improve performance. It’s worth testing to see if that helps.
@yhlong00000 Hi there, I am having a huge difference in cold start time between loading a model from the network volume in a Pod and Serverless, so two questions:
- Does the network volume perform poorly in Serverless compared to Pods?
- How can I choose a CPU with a higher clock speed in Serverless (I am already using as a GPU the RTX 4090)?
Thanks!
network volumes perform poorly in both Serverless and Pods. 😦
That is correct. But I am trying to figure out the discrepancy in model loading speed between the Pods and Serverless, for the same machine type. I also believe that the CPU is being throttled in Serverless.
Another explanation would be that the network volume is being "mounted" to the Pod and no more over the network. But this is just a guess..
I hope you find what you are looking for. If you do please share your results. I rarely use Pods so I'm not sure how they might differ.
Thank you, will do!
@yhlong00000 if I hav ana ctive worker should I implement logics to turn off and on this workers to not accumulate container storage or cache right?
The network volume is same, we might add filters for CPU in the future.
Not sure what are you asking, for serverless, if you set active worker > 0, we warm up the number of workers you set active and image is loaded in the host, so you won’t have cold start.
Sure but If I'm using storage frm the container or cache, this will be accumulated so at some point I'd have to reset the worker right? To avoid any complications
Reset? resetting / restarting workers arent really necesarry
it works well even without it