RunPod•7mo ago

6x speed reduction with network storage in serverless

To reduce my docker image size I wanted to use the network storage to store the models, but the main issue I am running against now is that I went from 20sec per request to 120sec. When looking at the logs, it takes almost 100sec (vs a few sec) to load the model in GPU memory. Why is the network storage so slow ??? its a major drawback and means you and I have to handle 10s of Gb of Docker image for nothing.

2 Replies

Encyrption•7mo ago

This is a known issue with network volume. @flash-singh has recently reported that a new service will be coming to RunPod soon to address this. It is a model cache where you can pull models from Huggingface and not embed them in your container image, RunPod will automatically inject the model into your worker, using the local NVME disk. In the mean time you will likely be better off embedding your models directly into your image.

XquaOP•7mo ago

Cool ! Would love to know when this is the case !

Gaming

Programming

6x speed reduction with network storage in serverless

Did you find this page helpful?