Network and Local Storage Performance
Hi, we are noticing very slow performance loading in our model on our Pods in the IS region. We are also noticing a very slow sequential read time when we copy the same model into local storage. The model loading takes about 10x as much time as it did for us on a different network. When we compare the sequential read time, we see about a 3x increase in time on Runpod. Local storage is about 5s faster than network storage.
our old network read
Runpod Network storage
Runpod Local storage
8 Replies
Thank you for the report! I'll file this with engineering.
They may ask, so I'll ask too - can you share the command you're using to generate these tests? I think I see dd and time?
Sounds good thanks. Here is the cmd
time dd if=/tmp/icbinpXL_v6.safetensors of=/dev/null
same command for the network storage just different pathHey @jphipps - could you share which IS datacenter you are using? We have three DC's in that region 🙂
Also what was the old network region/DC you were using?
It is EUR-IS-1
old was US-TX-3
Hey there. Can we get an update on this? It's creating a major performance impact for our users.
I'm following up on this now - just to make sure I have the right information on your workload:
1. Copy model from Network Storage to Local Storage
2. Load model from Local Storage into GPU VRam
3. Perform work
Step 1 is the one which is most impacted by EUR-IS-1, correct?
@Lahl, @jphipps We'd love to understand this better for you.
@brennen_runpod @Dj sorry I missed this message yesterday. yes that's right. Model will also need to sometimes swap between GPU and CPU depending on the size. The read time seems to be very slow