TristenHarr
RRunPod
•Created by blue whale on 2/11/2025 in #⚡|serverless
Job stuck in queue and workers are sitting idle
For sure, I think this is a known issue they are looking into! Will give them some time to dig in and if I still have problems later this week I'll follow up. 🙂
36 replies
RRunPod
•Created by jim on 2/19/2025 in #⚡|serverless
Feb 20 - Serverless Issues Mega-Thread
Great thanks so much! 🙂 No hard feelings on my side, I appreciate the info and support. There's always a few kinks to work out with new and exciting things like serverless GPU's so no problems here. I just hope this truly is fixed and patched because if there are known issues that is fine so long as they are documented, its when you expect everything to work fine and it isn't that things are a problem!
56 replies
RRunPod
•Created by blue whale on 2/11/2025 in #⚡|serverless
Job stuck in queue and workers are sitting idle
For me, there are no misconfigurations as far as I can tell. What RunPod is doing is it’ll get a request, a worker will become active, then the job will sit in the queue for 10+ minutes before getting picked up. It’s not flash-boot, (enabled) the logs say everything is ready from a worker perspective. (I’ve checked the logs extensively, it’s not stuck loading a model or anything of that nature. The worker should be ready.)
This happens even with multiple workers setup where only 1 will become active then everything sits in the queue for 10 minutes.
Once things spin up they seem to work fine, but everytime it’s a new spin up there’s a risk it’ll take 10+ minutes.
What I’ve been doing is increasing the time before it spins down and then trying to find a “good” worker and keep it open as long as I can, even sending redundant requests just to prevent getting a “bad” or “stuck” worker.
It’s also intermittent/flaky, sometimes it will spin up quick and work fine, sometimes it gets stuck like this. It’s not something that’s happening every time. I’d say maybe 10-30% of the time.
36 replies
RRunPod
•Created by blue whale on 2/11/2025 in #⚡|serverless
Job stuck in queue and workers are sitting idle
Same issue! We can’t move into production because of this issue. https://discord.com/channels/912829806415085598/1340773964397674709
36 replies
RRunPod
•Created by Saqib Zia on 2/19/2025 in #⚡|serverless
Job Stuck in Queue Eventhough worker is ready
Same issue!
13 replies
RRunPod
•Created by jim on 2/19/2025 in #⚡|serverless
Feb 20 - Serverless Issues Mega-Thread
56 replies
RRunPod
•Created by TristenHarr on 2/16/2025 in #⚡|serverless
Why isn't RunPod reliable?
I've got 3 workers, I tried to send a single request. The worker was ready, I watched the logs. It just was "stuck" for 10 minutes.
21 replies
RRunPod
•Created by TristenHarr on 2/16/2025 in #⚡|serverless
Why isn't RunPod reliable?
My logs said my worker is ready. This is H100's btw.
21 replies
RRunPod
•Created by TristenHarr on 2/16/2025 in #⚡|serverless
Why isn't RunPod reliable?
When it takes 10+ minutes to start a job how can anyone be expected to use this? I have built an entire app around this and now I'm going to have to find a new provider or provision things myself. 👎
21 replies
RRunPod
•Created by TristenHarr on 2/16/2025 in #⚡|serverless
Why isn't RunPod reliable?
Shouldn't having multiple workers prevent this from happening?
21 replies