Slow network volume

Some people reported, that loading models from network-volumes is very slow compared to baking the model into the image itself.
48 Replies
NERDDISCO
NERDDISCOOP6mo ago
@Encyrption would you mind sharing your experience / tests on this topic again? @briefPeach would you mind sharing your experience / tests on this topic?
Encyrption
Encyrption6mo ago
With identical payloads on identical images with only difference was one had network volume and the other had the models baked into the image. While I would see no discernable difference between executionTime I would consistently see an additional 30 - 60 seconds in delayTime when using network volume. I only tested this in EU-RO.
NERDDISCO
NERDDISCOOP6mo ago
And all of this was happening a month ago right?
Encyrption
Encyrption6mo ago
yes
Emad
Emad6mo ago
@NERDDISCO When I tried network volume with mine it was EU-RO it wouldnt leave queue
NERDDISCO
NERDDISCOOP6mo ago
@Karlas this sounds strange, not sure if this is related to the network volume. Did it resolve in the end?
Emad
Emad6mo ago
Nope wasnt able to resolve it I removed network volume and it was back to working fine @NERDDISCO How much network volume do you think I need for 8b model? Also it was on EU-SE 1
NERDDISCO
NERDDISCOOP6mo ago
@Karlas you should be good with around 20 GB, because the summary of all files in https://huggingface.co/meta-llama/Meta-Llama-3.1-8B/tree/main roughly is 18 GB. Maybe this was the issue with your worker? That the size was not large enough?
Emad
Emad6mo ago
alright should i try again with the 8b model with a network volume?
NERDDISCO
NERDDISCOOP6mo ago
Yeah I would try, because maybe this was the issue that it stuck. As you can totally create situations, when something is breaking, for example if the storage is not big enough. So if you have some time and energy, I would appreciate it if you could test this again
Emad
Emad6mo ago
and can i test on any region?
NERDDISCO
NERDDISCOOP6mo ago
Would you mind creating a new post, so we can talk about all the things llama 3.1 8B? I want to keep the info here about the network volumes 🙏
Emad
Emad6mo ago
Works now Not getting stuck in queue
NERDDISCO
NERDDISCOOP6mo ago
perfect!
Emad
Emad6mo ago
But again I am not trying with a 70b model like I was before, just with a 8b model When I used 70B I gave the network volume 150gb
briefPeach
briefPeach6mo ago
@NERDDISCO Is network volume supposed to be slower than baking the model into the container image? Since if baking into container, the model is stored physically on the GPU machine, but if with network volume then the models needs to be transferred by network to load into GPU machine? Because when I stored models in network volume, the step of loading model into GPU vram took way much longer than baking the model into container
NERDDISCO
NERDDISCOOP6mo ago
There shouldn’t be significant performance problems when loading the model from the network volume. That’s why I’m trying to find people who have problems, so that we can find out, what the underlying problem might be. So if you have any data in terms of used data centers, the models, the dates when this was tested. Then I can collect this and present the info to our team, so we can take a look at this.
briefPeach
briefPeach6mo ago
I’m confused. Why there shouldn’t be significant difference for loading the model. I think physical storage should be significantly faster than network storage? Ok once I try again, I’ll give you that data
NERDDISCO
NERDDISCOOP6mo ago
That is true, but yeah I don’t have actual numbers here. So this is something I want to validate too, so that we can provide better reasoning for the community! Thank you so much! I will also do some tests next week.
briefPeach
briefPeach6mo ago
thank you! Yeah some benchmarks would be super helpful for us to choose which one to use in different situations
shawtyisaten
shawtyisaten5mo ago
it's back to being extremely slow last several days. network volume is in eu-ro-1
Encyrption
Encyrption5mo ago
I have currently unselected all EU-, EUR-, and US-OR regions. These seem to be the regions experiencing issues.
shawtyisaten
shawtyisaten5mo ago
@Encyrption which regions do you recommend?
Encyrption
Encyrption5mo ago
I don't use network volume so I am not tied to region so I select global, then unselect any problematic regions... then I can use any others. once those issues are resolved, I will add those regions back in.
shawtyisaten
shawtyisaten5mo ago
just tested out and i'm seeing that US-TX is at least 2x faster
Charixfox
Charixfox5mo ago
Hopefully this gets resolved soon. I'm using storage to hold a 65GB model on specific GPU hardware and 40+ sec/it to load it is not good at all.
yhlong00000
yhlong000005mo ago
Based on your numbers, you’re getting around 1.6 GB/s. Do you have any specific speed expectations or benchmarks you were aiming for?
Charixfox
Charixfox5mo ago
s/it, out of seven segments. Full load of the model takes at least 280 seconds in those instances, but about 21 seconds in other geographical areas. Even more oddly, sometimes it will load two segments at 1-4 s/it, and then the next will be 38 s/it, and then the next three will be 50-60 each. It's very inconsistent. The 4+ minute loading is a cold start that I'm paying for every second of, and it might happen when the container is destroyed immediately after a run, and while the container is doing that cold start, any requests routed to it time out on the client side.
yhlong00000
yhlong000005mo ago
If you include your model as part of the Docker image, it could help reduce cold start times. Loading the model from the host disk is generally faster and more consistent
Charixfox
Charixfox5mo ago
Is that a viable option for such a large model? I was under the impression it only scaled well for smaller models.
yhlong00000
yhlong000005mo ago
How big is your model? I see customer have 350G+ docker image and it works for them.
Encyrption
Encyrption5mo ago
Wow, 350G+ I will no longer feel bad about my 40GB images LOL 😉
nerdylive
nerdylive5mo ago
okie lets try baking llama3 model onto a docker image
Lta
Lta5mo ago
The network volumes are very slow. Loading models from them is usually a bad idea. On an A1111 image, the inference time with sdxl is multiplied at least by 2 due to network disk access
Encyrption
Encyrption5mo ago
I agree 💯 ! My issue has been finding a way to force loading them from disk. I recently had OpenVoice model refusing to use the baked in model and instead tries to download the model. Is there a way to make them use the baked in model regardless?
Lta
Lta5mo ago
This is specific to the software you're running, so no idea
yhlong00000
yhlong000005mo ago
Local disk is generally faster than network volumes, and working with many small files on a network volume may result in slower speeds—it’s better to compress them. If multiple pods on the same machine use the network volume, they will share the bandwidth.
Charixfox
Charixfox5mo ago
I split the model layers just fine, but one of the stock layers on the worker-vllm image is just shy of 13GB when built, so I'll be poking at that for a bit.
riceboy26
riceboy265mo ago
What does it mean to compress a model when it’s a .pth or .safetensor file?
yhlong00000
yhlong000005mo ago
I mean if you have bunch of images, it better compress them and send it, rather than one file at a time. Model file usually a big chunk of file and should be fine
Kaneda
Kaneda4mo ago
which model did you use ? how the hell can they bake 350Gig model in docker ? where do they host their image ? docker hub ?
Encyrption
Encyrption4mo ago
The biggest I have ever used was 85.8 GB, I host that on Docker Hub. I run many models. I am trying to build out an AI market place.
No description
Encyrption
Encyrption4mo ago
I'm currently having a constraint on resources (max workers) with RunPod so I may have to work with other providers (i.e. Modal, etc.) to add additional models.
Kaneda
Kaneda4mo ago
i have heard myself of vast.ai, fly and lambdalabs, what would you suggest ?
Encyrption
Encyrption4mo ago
I am still trying to figure that out myself. Not sure any of them scale from 0 like RunPod.
yhlong00000
yhlong000004mo ago
Yep, docker hub
shawtyisaten
shawtyisaten4mo ago
don't use vast.ai. recommend looking into modal but their syntax is confusing af.
riceboy26
riceboy263mo ago
i've found a way... after my pc docker engine wroke bc of some weird wsl issue that i still havent figured out.... i start a GCP VM with the deep learning linux image with a big boot disk and run docker build there. you get to take advantage of enterprise grade networking so builds are much bigger too a 20gb docker image takes less than 5 minutes to push to dockerhub, whereas it would've taken 40 minutes on my residential toaster wifi

Did you find this page helpful?