Intermittent Slow Performance Issue with GPU Workers
I am currently encountering an intermittent issue with some GPU workers exhibiting significantly slower performance. I have tried to measure the time taken for a specific task on a designated type of GPU worker (4090 24GB). Typically, when I send the exact identical payload input to the endpoint, the execution time is around 1 minute. However, I have observed that occasionally, a worker becomes exceptionally slow. Even with the same payload input, Docker image, tag, and GPU type, the execution time extends to a few hours. Notably, during these occurrences, the GPU utilization remains constantly at 0%.
Upon reviewing the output log, it is evident that the inference speed is unusually slow when the affected worker is in operation. Have any of you experienced a similar problem, and if so, how did you resolve it?
Your insights and assistance in addressing this issue would be greatly appreciated. Thank you.
4 Replies
If you have a endpoint ID + templates + what work you are running, if the staff sees this later it would be helpful. Also if you end up finding out the worker id that is causing it.
https://discord.com/channels/912829806415085598/1194813255391125595
Could this be what I’m seeing as well?
We have exactly the same problem here. 4090, sometimes fast ~10it/s, sometimes slow ~5it/s (doubling the costs!) – for the exact same request/workload.
Has the problem been solved for you in the meantime @n8tzto?
Send some request ids
Or maybe worker id's too