6x speed reduction with network storage in serverless
To reduce my docker image size I wanted to use the network storage to store the models, but the main issue I am running against now is that I went from 20sec per request to 120sec.
When looking at the logs, it takes almost 100sec (vs a few sec) to load the model in GPU memory.
Why is the network storage so slow ??? its a major drawback and means you and I have to handle 10s of Gb of Docker image for nothing.
2 Replies
This is a known issue with network volume. @flash-singh has recently reported that a new service will be coming to RunPod soon to address this. It is a model cache where you can pull models from Huggingface and not embed them in your container image, RunPod will automatically inject the model into your worker, using the local NVME disk. In the mean time you will likely be better off embedding your models directly into your image.
Cool ! Would love to know when this is the case !