R
RunPod4d ago
ofer

"Throttled" and re-"Initializing" workers everywhere today

Is there some incident going on with serverless today? I have 30 workers that are all "Throttled", other workers just disapear and others initialize instead of them all the time. every request that normally takes 10 seconds is taking minutes... This is true in multiple locations too. Most of my workers ended up in CA-MTL-1 but others in EU-* are displaying the same problems
6 Replies
yhlong00000
yhlong000004d ago
What type of GPUs are you using? When we’re short on supply, this kind of behavior can happen.
v3n0m
v3n0m3d ago
Same issue here. Using A5000 GPUs on CA-MTL-1
ToonyGen
ToonyGen3d ago
same. a lot today we use a5000, L4, 3090 serverless. CA-MTL CA-MTL-1 seems to have some problems. I see multiple worker Throttled. Fri Jan 17 1:50pm EST. Not an issue for us since it is quiet period for our app.
alisandagdelen
same
yhlong00000
yhlong000003d ago
We’ve seen high utilization of A5000 GPUs over the last few days in CA-MTL-1. If you could deploy your workers globally, it would help alleviate this kind of situation. Additionally, you could consider adding other GPU types as backups in serverless GPU selection page.
ofer
oferOP2d ago
Thank you for the follow up. Most of the affected workers were A5000/L4/3090 class. I was not very restrictive in global distribution, I only excluded a handful of eastern european locations because of longer round trip times from my own servers. Most of the A5000s just "organically" allocated themselves from CA-MTL-1 Extending to all global locations did not really help either. Things are looking good at the moment, but earlier today I did have a few hours where delays, throttling an reinitializations were occurring frequently. I have to say RunPod Serverless has been rock solid so far, I'm very hopeful that you will be able to iron out these kinks and be able to continue offering an excellent service

Did you find this page helpful?