"Throttled" and re-"Initializing" workers everywhere today
Is there some incident going on with serverless today? I have 30 workers that are all "Throttled", other workers just disapear and others initialize instead of them all the time.
every request that normally takes 10 seconds is taking minutes...
This is true in multiple locations too. Most of my workers ended up in CA-MTL-1 but others in EU-* are displaying the same problems
6 Replies
What type of GPUs are you using? When we’re short on supply, this kind of behavior can happen.
Same issue here. Using A5000 GPUs on CA-MTL-1
same. a lot today we use a5000, L4, 3090 serverless. CA-MTL
CA-MTL-1 seems to have some problems. I see multiple worker Throttled. Fri Jan 17 1:50pm EST. Not an issue for us since it is quiet period for our app.
same
We’ve seen high utilization of A5000 GPUs over the last few days in CA-MTL-1. If you could deploy your workers globally, it would help alleviate this kind of situation. Additionally, you could consider adding other GPU types as backups in serverless GPU selection page.
Thank you for the follow up.
Most of the affected workers were A5000/L4/3090 class.
I was not very restrictive in global distribution, I only excluded a handful of eastern european locations because of longer round trip times from my own servers.
Most of the A5000s just "organically" allocated themselves from CA-MTL-1
Extending to all global locations did not really help either.
Things are looking good at the moment, but earlier today I did have a few hours where delays, throttling an reinitializations were occurring frequently.
I have to say RunPod Serverless has been rock solid so far, I'm very hopeful that you will be able to iron out these kinks and be able to continue offering an excellent service