RunPod•9mo ago

Model loadtime affected if PODs are running on the same server

I was trying to debug the latency on my test PODs and now I figured that PODs running on the same physical machine are lagging too much on IO access. After profilling, I've got these results. Example: Initial test on POD - running on a single POD model load time for 6Gb model is 2 sec - when I pulled 2 GPUs from the same server model load increased to 40 sec Even inference is affected, RAM leaking? On Serverless: - Same GPU 4090, gets different inference and load time as well - 30s for loading, 4 sec depending on the machine - inference is non uniform as well: 20s on some and 10s on some All running the same docker, and same scripts with the same libraries. Do we have any work in place to ensure we have uniformity on HW? Are we enforcing servers to have separate SSD / NVME for each GPU and including different pipe for IO access? Need to have some idea if this is persisting issue, I'm pretty sure the Mbps on the descriptors are not reflecting the reality at all. EDIT: I'm using US region now, Global the problem is worse.

8 Replies

AC_pillOP•9mo ago

Do we have any answers here?

nerdylive•9mo ago

Hmm maybe in a same region?

AC_pillOP•9mo ago

I was using Global before, the problem was worse, and now the same region GPUs are showing discrepancy as well. There is no uniformity on inference power. Maybe cap?

nerdylive•9mo ago

or else you have to ask this on ticket maybe on pods? i heard some pods have t his problem

AC_pillOP•9mo ago

there is no support here? This is extreme important to share in a board, so we can see the problem repeats yeah the problem is on Serverless and PODs, I'm stress testing and it's clear now it's a hardware issue

nerdylive•9mo ago

there is actually, but they're not pretty active here because they are easier to report problems in their own platform yeah or software cap maybe

AC_pillOP•9mo ago

interesting, I'll post that, but leave this open so the other users can see, I'm already seeing a lot of complains on the same, so it's getting hard to push to production. Yes, software cap on docker host.

nerdylive•9mo ago

alright @haris

Gaming

Programming

Model loadtime affected if PODs are running on the same server

Did you find this page helpful?