GPU pod's performance is inconsistent
I am using a pod (RTX 4090 with 100GB network-volume) to generate image.
As expected, a task need around 5-6s to finish. sometime performance drop to 30s/task.
Can anyone explain what's going on to me? Thank you so much
6 Replies
Run
nvidia-smi
to check whether the host has enabled power capwatch
nvidia-smi
when inference task is running, power usage is around 40-70w/450w450W means its not power capped
i dont know, but I running a same task (everything is the same), if power usage is 70w/450w, it takes 30s; and if power usage is 200w/450w, it takes 5s.
why it's so inconsistent? how can i configure it to make it more stable?
Im not sure what's causing its unstable
Probably some bug in the application