R
RunPod•2mo ago
Lucas

🆘 We've encountered a serious issue with the machines running in our production environment

🆘 We've encountered a serious issue with the machines running in our production environment on RunPod: the GPU utilization fluctuates wildly, sometimes even dropping to zero, which significantly slows down task execution. Who should I contact?
No description
15 Replies
nerdylive
nerdylive•2mo ago
Tips rather than making it hard to read starting from a SOS sign, make your title clearer by telling the problem and description the problem So what you're saying is you're not using the gpu at all, no model inference but the gpu usage is still up and down? If so, try reporting via the website
Lucas
Lucas•2mo ago
The reason we're using SOS is because we've encountered this issue in a production environment, which directly affects the user experience, but I don't know who to turn to for help.
nerdylive
nerdylive•2mo ago
All good!
Lucas
Lucas•2mo ago
During the inference process, we received feedback from users that the inference speed was particularly slow. Upon checking, we confirmed that the issue was indeed related to the inference, but the GPU utilization was either zero or very low.
nerdylive
nerdylive•2mo ago
Did it just happen without any production changes ? can you replicate it onto another pod?
Lucas
Lucas•2mo ago
Despite all other conditions remaining unchanged, sometimes the inference speed is fast, and at other times it is very slow, even though the model has already been loaded into the GPU memory.
nerdylive
nerdylive•2mo ago
it seems to me that the nvidia-smi is displaying normal
Lucas
Lucas•2mo ago
yeah
nerdylive
nerdylive•2mo ago
Same config?
Lucas
Lucas•2mo ago
but gpu kernels are not running at all inference speed is extremely low yes
nerdylive
nerdylive•2mo ago
Can you replicate it onto another pod? maybe for now switch onto another pod while reporting it to runpod
Lucas
Lucas•2mo ago
GPU utilization fluctuates wildly, sometimes even dropping to zero, and we have nothing changed! This is going to take up more of our time, and we are short-staffed. I just want to know if Runpod has technical personnel who can help us troubleshoot this issue. We have checked the code logic and found no issues. how to report to runpod?
nerdylive
nerdylive•2mo ago
Well maybe yes, but not here Contact button from the website then you'll be redirected into another page
Lucas
Lucas•2mo ago
OK thx
nerdylive
nerdylive•2mo ago
Np, hope you can resolve this soon!
Want results from more Discord servers?
Add your server