RunPod•9mo ago

Slow network volume

Some people reported, that loading models from network-volumes is very slow compared to baking the model into the image itself.

48 Replies

NERDDISCOOP•9mo ago

@Encyrption would you mind sharing your experience / tests on this topic again? @briefPeach would you mind sharing your experience / tests on this topic?

Encyrption•9mo ago

With identical payloads on identical images with only difference was one had network volume and the other had the models baked into the image. While I would see no discernable difference between executionTime I would consistently see an additional 30 - 60 seconds in delayTime when using network volume. I only tested this in EU-RO.

NERDDISCOOP•9mo ago

And all of this was happening a month ago right?

Encyrption•9mo ago

yes

Emad•9mo ago

@NERDDISCO When I tried network volume with mine it was EU-RO it wouldnt leave queue

NERDDISCOOP•9mo ago

@Karlas this sounds strange, not sure if this is related to the network volume. Did it resolve in the end?

Emad•9mo ago

Nope wasnt able to resolve it I removed network volume and it was back to working fine @NERDDISCO How much network volume do you think I need for 8b model? Also it was on EU-SE 1

NERDDISCOOP•9mo ago

@Karlas you should be good with around 20 GB, because the summary of all files in https://huggingface.co/meta-llama/Meta-Llama-3.1-8B/tree/main roughly is 18 GB. Maybe this was the issue with your worker? That the size was not large enough?

meta-llama/Meta-Llama-3.1-8B at main

Emad•9mo ago

alright should i try again with the 8b model with a network volume?

NERDDISCOOP•9mo ago

Yeah I would try, because maybe this was the issue that it stuck. As you can totally create situations, when something is breaking, for example if the storage is not big enough. So if you have some time and energy, I would appreciate it if you could test this again

Emad•9mo ago

and can i test on any region?

NERDDISCOOP•9mo ago

Would you mind creating a new post, so we can talk about all the things llama 3.1 8B? I want to keep the info here about the network volumes 🙏

Emad•9mo ago

Works now Not getting stuck in queue

NERDDISCOOP•9mo ago

perfect!

Emad•9mo ago

But again I am not trying with a 70b model like I was before, just with a 8b model When I used 70B I gave the network volume 150gb

briefPeach•9mo ago

@NERDDISCO Is network volume supposed to be slower than baking the model into the container image? Since if baking into container, the model is stored physically on the GPU machine, but if with network volume then the models needs to be transferred by network to load into GPU machine? Because when I stored models in network volume, the step of loading model into GPU vram took way much longer than baking the model into container

NERDDISCOOP•9mo ago

There shouldn’t be significant performance problems when loading the model from the network volume. That’s why I’m trying to find people who have problems, so that we can find out, what the underlying problem might be. So if you have any data in terms of used data centers, the models, the dates when this was tested. Then I can collect this and present the info to our team, so we can take a look at this.

briefPeach•9mo ago

I’m confused. Why there shouldn’t be significant difference for loading the model. I think physical storage should be significantly faster than network storage? Ok once I try again, I’ll give you that data

NERDDISCOOP•9mo ago

That is true, but yeah I don’t have actual numbers here. So this is something I want to validate too, so that we can provide better reasoning for the community! Thank you so much! I will also do some tests next week.

briefPeach•9mo ago

thank you! Yeah some benchmarks would be super helpful for us to choose which one to use in different situations

shawtyisaten•8mo ago

it's back to being extremely slow last several days. network volume is in eu-ro-1

Encyrption•8mo ago

I have currently unselected all EU-, EUR-, and US-OR regions. These seem to be the regions experiencing issues.

shawtyisaten•8mo ago

@Encyrption which regions do you recommend?

Encyrption•8mo ago

I don't use network volume so I am not tied to region so I select global, then unselect any problematic regions... then I can use any others. once those issues are resolved, I will add those regions back in.

shawtyisaten•8mo ago

just tested out and i'm seeing that US-TX is at least 2x faster

Charixfox•8mo ago

Hopefully this gets resolved soon. I'm using storage to hold a 65GB model on specific GPU hardware and 40+ sec/it to load it is not good at all.

yhlong00000•8mo ago

Based on your numbers, you’re getting around 1.6 GB/s. Do you have any specific speed expectations or benchmarks you were aiming for?

Charixfox•8mo ago

s/it, out of seven segments. Full load of the model takes at least 280 seconds in those instances, but about 21 seconds in other geographical areas. Even more oddly, sometimes it will load two segments at 1-4 s/it, and then the next will be 38 s/it, and then the next three will be 50-60 each. It's very inconsistent. The 4+ minute loading is a cold start that I'm paying for every second of, and it might happen when the container is destroyed immediately after a run, and while the container is doing that cold start, any requests routed to it time out on the client side.

yhlong00000•8mo ago

If you include your model as part of the Docker image, it could help reduce cold start times. Loading the model from the host disk is generally faster and more consistent

Charixfox•8mo ago

Is that a viable option for such a large model? I was under the impression it only scaled well for smaller models.

yhlong00000•8mo ago

How big is your model? I see customer have 350G+ docker image and it works for them.

Encyrption•8mo ago

Wow, 350G+ I will no longer feel bad about my 40GB images LOL 😉

Jason•8mo ago

okie lets try baking llama3 model onto a docker image

Lta•8mo ago

The network volumes are very slow. Loading models from them is usually a bad idea. On an A1111 image, the inference time with sdxl is multiplied at least by 2 due to network disk access

Encyrption•8mo ago

I agree 💯 ! My issue has been finding a way to force loading them from disk. I recently had OpenVoice model refusing to use the baked in model and instead tries to download the model. Is there a way to make them use the baked in model regardless?

Lta•8mo ago

This is specific to the software you're running, so no idea

yhlong00000•8mo ago

Local disk is generally faster than network volumes, and working with many small files on a network volume may result in slower speeds—it’s better to compress them. If multiple pods on the same machine use the network volume, they will share the bandwidth.

Charixfox•8mo ago

I split the model layers just fine, but one of the stock layers on the worker-vllm image is just shy of 13GB when built, so I'll be poking at that for a bit.

riceboy26•8mo ago

What does it mean to compress a model when it’s a .pth or .safetensor file?

yhlong00000•8mo ago

I mean if you have bunch of images, it better compress them and send it, rather than one file at a time. Model file usually a big chunk of file and should be fine

Kaneda•7mo ago

which model did you use ? how the hell can they bake 350Gig model in docker ? where do they host their image ? docker hub ?

Encyrption•7mo ago

The biggest I have ever used was 85.8 GB, I host that on Docker Hub. I run many models. I am trying to build out an AI market place.

Encyrption•7mo ago

I'm currently having a constraint on resources (max workers) with RunPod so I may have to work with other providers (i.e. Modal, etc.) to add additional models.

Kaneda•7mo ago

i have heard myself of vast.ai, fly and lambdalabs, what would you suggest ?

Encyrption•7mo ago

I am still trying to figure that out myself. Not sure any of them scale from 0 like RunPod.

yhlong00000•7mo ago

Yep, docker hub

shawtyisaten•7mo ago

don't use vast.ai. recommend looking into modal but their syntax is confusing af.

riceboy26•6mo ago

i've found a way... after my pc docker engine wroke bc of some weird wsl issue that i still havent figured out.... i start a GCP VM with the deep learning linux image with a big boot disk and run docker build there. you get to take advantage of enterprise grade networking so builds are much bigger too a 20gb docker image takes less than 5 minutes to push to dockerhub, whereas it would've taken 40 minutes on my residential toaster wifi

Gaming

Programming

Slow network volume

Did you find this page helpful?