R
RunPod•5mo ago
Stephen

🆘 We've encountered a serious issue with the machines running in our production environment

🆘 We've encountered a serious issue with the machines running in our production environment on RunPod: the GPU utilization fluctuates wildly, sometimes even dropping to zero, which significantly slows down task execution. Who should I contact?
No description
15 Replies
nerdylive
nerdylive•5mo ago
Tips rather than making it hard to read starting from a SOS sign, make your title clearer by telling the problem and description the problem So what you're saying is you're not using the gpu at all, no model inference but the gpu usage is still up and down? If so, try reporting via the website
Stephen
StephenOP•5mo ago
The reason we're using SOS is because we've encountered this issue in a production environment, which directly affects the user experience, but I don't know who to turn to for help.
nerdylive
nerdylive•5mo ago
All good!
Stephen
StephenOP•5mo ago
During the inference process, we received feedback from users that the inference speed was particularly slow. Upon checking, we confirmed that the issue was indeed related to the inference, but the GPU utilization was either zero or very low.
nerdylive
nerdylive•5mo ago
Did it just happen without any production changes ? can you replicate it onto another pod?
Stephen
StephenOP•5mo ago
Despite all other conditions remaining unchanged, sometimes the inference speed is fast, and at other times it is very slow, even though the model has already been loaded into the GPU memory.
nerdylive
nerdylive•5mo ago
it seems to me that the nvidia-smi is displaying normal
Stephen
StephenOP•5mo ago
yeah
nerdylive
nerdylive•5mo ago
Same config?
Stephen
StephenOP•5mo ago
but gpu kernels are not running at all inference speed is extremely low yes
nerdylive
nerdylive•5mo ago
Can you replicate it onto another pod? maybe for now switch onto another pod while reporting it to runpod
Stephen
StephenOP•5mo ago
GPU utilization fluctuates wildly, sometimes even dropping to zero, and we have nothing changed! This is going to take up more of our time, and we are short-staffed. I just want to know if Runpod has technical personnel who can help us troubleshoot this issue. We have checked the code logic and found no issues. how to report to runpod?
nerdylive
nerdylive•5mo ago
Well maybe yes, but not here Contact button from the website then you'll be redirected into another page
Stephen
StephenOP•5mo ago
OK thx
nerdylive
nerdylive•5mo ago
Np, hope you can resolve this soon!
Want results from more Discord servers?
Add your server