R
RunPod7mo ago
JanE

same GPU, different machine -> different speed

The image shows 2 yolo object detection runs with identical setup (same batch size, image size, number of epochs) on 2 different runpods. The GPU was in both cases the RTX 4090 slow machine +---------------------------------------------------------------------------------------+ | NVIDIA-SMI 535.129.03 Driver Version: 535.129.03 CUDA Version: 12.2 | |-----------------------------------------+----------------------+----------------------+ |=========================================+======================+======================| | 0 NVIDIA GeForce RTX 4090 On | 00000000:A1:00.0 Off | Off | fast machine +-----------------------------------------------------------------------------------------+ | NVIDIA-SMI 550.54.15 Driver Version: 550.54.15 CUDA Version: 12.4 | |-----------------------------------------+------------------------+----------------------+ | 0 NVIDIA GeForce RTX 4090 On | 00000000:01:00.0 Off | Off | There was a 30% increase in training speed on the fast machine, and the power consumption was less. (1) Is this only due to the driver being newer? (2) Would the effect be the same for an older GPU, like the A100 ?
No description
6 Replies
digigoblin
digigoblin7mo ago
Check the power watts, one may be power capped.
JanE
JanEOP7mo ago
but why the difference in speed?
digigoblin
digigoblin7mo ago
Power capped machines are slower because power capping cripples performance Check the power isn't cappped I have had 4090's power capped before in FR region in community cloud and complained to RunPod to get a refund and asked for the machine to be delisted and the host to be banned from RunPod because it amounts to fraud.
JanE
JanEOP7mo ago
hm... but in my case the GPU consuming less power, is also faster. I would expect it to be slower. Also, the "red" GPU (slower speed, higher power consumption) was on a secure cloud, while the "blue" one is on community cloud
digigoblin
digigoblin7mo ago
Its not about the consumption, its about how much the max power is, on 4090 it should be 450W.
JanE
JanEOP7mo ago
okay, but I have a small dataset, RAM use is below 10%. Probably the GPU cannot operate at its max because the epochs are too small? Or is max power consumption indepent of VRAM use? regarding you comment about how much the max power is, in my plots 70W consumption is about 10% of max power, meaning that 100% would be way above 450W
Want results from more Discord servers?
Add your server