R
RunPod•6d ago
TristenHarr

Why isn't RunPod reliable?

I have 3 workers setup. When I submit a request sometimes it sits in the queue for 5+ minutes before processing begins. I can see a single worker running while the rest idle, but the work isn't getting done. This isn't suitable for production if it takes 5+ minutes to kick off a job. Am I doing something wrong or does this service just not work well?
No description
10 Replies
TristenHarr
TristenHarrOP•6d ago
Shouldn't having multiple workers prevent this from happening? When it takes 10+ minutes to start a job how can anyone be expected to use this? I have built an entire app around this and now I'm going to have to find a new provider or provision things myself. 👎
EvilD
EvilD•6d ago
hapeening to me too do we have any answer ??
Justin
Justin•6d ago
Same here, sometimes the worker is just initilizing for hours
nerdylive
nerdylive•5d ago
Hi are you guys experiencing this https://discord.com/channels/912829806415085598/1338944716955189350 Oh that you can check the logs what's still happening, or if it's stuck try to remove the worker I believe slmething is wrong when it takes long for jobs to be taken when there's worker ready Btw check your scaling type, who knows it's not queue delay
Justin
Justin•5d ago
I actually hada network volume on one of the DC's that had fulyl availability for h100 and it seems the reason why they are all available was that you cannot use any kind of workerso nthem as it will just continue to initialize I have switched to a different one and then it worked again But i am also still experiencing a lot of issues with the worker not actually starting even tho the queue is full or rather has one job in it, the worker is just idling
TristenHarr
TristenHarrOP•5d ago
My logs said my worker is ready. This is H100's btw. I've got 3 workers, I tried to send a single request. The worker was ready, I watched the logs. It just was "stuck" for 10 minutes.
nerdylive
nerdylive•5d ago
Ic Yeah that's a problem Yeah that's the problem when it's ready but not taking jobs Let staffs check
Justin
Justin•5d ago
So i should send a support ticket you mean?
nerdylive
nerdylive•5d ago
Yes, I told a staff too about this
Dj
Dj•2d ago
@TristenHarr Can I have an affected endpoint ID?

Did you find this page helpful?