RunPod•11mo ago

Issues in SE region causing a massive amount of jobs to be retried

The issues in the screenshot are causing 10% of my jobs to be retried in SE region. Please fix this, its not happening in CA region.

20 Replies

digigoblinOP•11mo ago

Obviously I am referring to the "Connection timeout" errors which causes the job results to fail to be returned, and not the single exeption among them.

Madiator2011•11mo ago

@digigoblin DO YOU MIND SUBMITING AS TICKET ON WEBSITE EASIER TO ESCALATE

digigoblinOP•11mo ago

No need to shout but sure 😁

Madiator2011•11mo ago

ups sorry for caps

digigoblinOP•11mo ago

Ticket number is 4208

Madiator2011•11mo ago

done

digigoblinOP•11mo ago

Thank you

Jason•11mo ago

hahaha wait SE? my jobs works well btw

digigoblinOP•11mo ago

You probably didn't try and send 1000 jobs today

Jason•11mo ago

Yes yes

digigoblinOP•11mo ago

I said 10% are retried NOT ALL 🤦‍♂️

Jason•11mo ago

im using dev on SE Ooh so 10% expected to fail?

digigoblinOP•11mo ago

They are retried they don't fail

Jason•11mo ago

well goodluck on your problem

digigoblinOP•11mo ago

RunPod needs to check it out, I switched to CA in the meantime and it works fine without any issues.

Jason•11mo ago

yeah great to hear

digigoblinOP•11mo ago

I was using CA but then switched to SE because my jobs were failing, but it was actually because my own Redis server had OOM issues due to running out of memory and wasn't a RunPod issue. So I upgraded my ElastiCache instance on AWS from cache.t3.medium to cache.m4.large and now its fine.

Jason•11mo ago

Wow you use elasticache? why not self hosted redis

digigoblinOP•11mo ago

Because its a cluster not a single instance

Jason•11mo ago

oh ic

Gaming

Programming

Issues in SE region causing a massive amount of jobs to be retried

Did you find this page helpful?