RunPod•2mo ago

Network and Local Storage Performance

Hi, we are noticing very slow performance loading in our model on our Pods in the IS region. We are also noticing a very slow sequential read time when we copy the same model into local storage. The model loading takes about 10x as much time as it did for us on a different network. When we compare the sequential read time, we see about a 3x increase in time on Runpod. Local storage is about 5s faster than network storage. our old network read

13550863+1 records in
13550863+1 records out
6938042106 bytes (6.9 GB, 6.5 GiB) copied, 7.09554 s, 978 MB/s

real    0m7.283s

13550863+1 records in
13550863+1 records out
6938042106 bytes (6.9 GB, 6.5 GiB) copied, 7.09554 s, 978 MB/s

real    0m7.283s

Runpod Network storage

13550863+1 records in
13550863+1 records out
6938042106 bytes (6.9 GB, 6.5 GiB) copied, 20.3418 s, 341 MB/s

real    0m20.351s
user    0m2.008s
sys     0m12.863s

13550863+1 records in
13550863+1 records out
6938042106 bytes (6.9 GB, 6.5 GiB) copied, 20.3418 s, 341 MB/s

real    0m20.351s
user    0m2.008s
sys     0m12.863s

Runpod Local storage

13550863+1 records in
13550863+1 records out
6938042106 bytes (6.9 GB, 6.5 GiB) copied, 17.5572 s, 395 MB/s

real    0m17.560s
user    0m2.279s
sys     0m15.259s

13550863+1 records in
13550863+1 records out
6938042106 bytes (6.9 GB, 6.5 GiB) copied, 17.5572 s, 395 MB/s

real    0m17.560s
user    0m2.279s
sys     0m15.259s

8 Replies

Dj•2mo ago

Thank you for the report! I'll file this with engineering. They may ask, so I'll ask too - can you share the command you're using to generate these tests? I think I see dd and time?

jphippsOP•2mo ago

Sounds good thanks. Here is the cmd time dd if=/tmp/icbinpXL_v6.safetensors of=/dev/null same command for the network storage just different path

brennen_runpod•2mo ago

Hey @jphipps - could you share which IS datacenter you are using? We have three DC's in that region 🙂 Also what was the old network region/DC you were using?

jphippsOP•2mo ago

It is EUR-IS-1 old was US-TX-3

Lahl•2mo ago

Hey there. Can we get an update on this? It's creating a major performance impact for our users.

brennen_runpod•2mo ago

I'm following up on this now - just to make sure I have the right information on your workload: 1. Copy model from Network Storage to Local Storage 2. Load model from Local Storage into GPU VRam 3. Perform work Step 1 is the one which is most impacted by EUR-IS-1, correct?

Dj•2mo ago

@Lahl, @jphipps We'd love to understand this better for you.

jphippsOP•2mo ago

@brennen_runpod @Dj sorry I missed this message yesterday. yes that's right. Model will also need to sometimes swap between GPU and CPU depending on the size. The read time seems to be very slow

Gaming

Programming

Network and Local Storage Performance

Did you find this page helpful?