RunPod•12mo ago

same GPU, different machine -> different speed

The image shows 2 yolo object detection runs with identical setup (same batch size, image size, number of epochs) on 2 different runpods. The GPU was in both cases the RTX 4090 slow machine +---------------------------------------------------------------------------------------+ | NVIDIA-SMI 535.129.03 Driver Version: 535.129.03 CUDA Version: 12.2 | |-----------------------------------------+----------------------+----------------------+ |=========================================+======================+======================| | 0 NVIDIA GeForce RTX 4090 On | 00000000:A1:00.0 Off | Off | fast machine +-----------------------------------------------------------------------------------------+ | NVIDIA-SMI 550.54.15 Driver Version: 550.54.15 CUDA Version: 12.4 | |-----------------------------------------+------------------------+----------------------+ | 0 NVIDIA GeForce RTX 4090 On | 00000000:01:00.0 Off | Off | There was a 30% increase in training speed on the fast machine, and the power consumption was less. (1) Is this only due to the driver being newer? (2) Would the effect be the same for an older GPU, like the A100 ?

6 Replies

digigoblin•12mo ago

Check the power watts, one may be power capped.

JanEOP•12mo ago

but why the difference in speed?

digigoblin•12mo ago

Power capped machines are slower because power capping cripples performance Check the power isn't cappped I have had 4090's power capped before in FR region in community cloud and complained to RunPod to get a refund and asked for the machine to be delisted and the host to be banned from RunPod because it amounts to fraud.

JanEOP•12mo ago

hm... but in my case the GPU consuming less power, is also faster. I would expect it to be slower. Also, the "red" GPU (slower speed, higher power consumption) was on a secure cloud, while the "blue" one is on community cloud

digigoblin•12mo ago

Its not about the consumption, its about how much the max power is, on 4090 it should be 450W.

JanEOP•12mo ago

okay, but I have a small dataset, RAM use is below 10%. Probably the GPU cannot operate at its max because the epochs are too small? Or is max power consumption indepent of VRAM use? regarding you comment about how much the max power is, in my plots 70W consumption is about 10% of max power, meaning that 100% would be way above 450W

Gaming

Programming

same GPU, different machine -> different speed

Did you find this page helpful?