Issues in SE region causing a massive amount of jobs to be retried
The issues in the screenshot are causing 10% of my jobs to be retried in SE region. Please fix this, its not happening in CA region.
20 Replies
Obviously I am referring to the "Connection timeout" errors which causes the job results to fail to be returned, and not the single exeption among them.
@digigoblin DO YOU MIND SUBMITING AS TICKET ON WEBSITE EASIER TO ESCALATE
No need to shout but sure 😁
ups sorry for caps
Ticket number is 4208
done
Thank you
hahaha
wait SE?
my jobs works well btw
You probably didn't try and send 1000 jobs today
Yes yes
I said 10% are retried NOT ALL 🤦♂️
im using dev on SE
Ooh so 10% expected to fail?
They are retried they don't fail
well goodluck on your problem
RunPod needs to check it out, I switched to CA in the meantime and it works fine without any issues.
yeah
great to hear
I was using CA but then switched to SE because my jobs were failing, but it was actually because my own Redis server had OOM issues due to running out of memory and wasn't a RunPod issue.
So I upgraded my ElastiCache instance on AWS from
cache.t3.medium
to cache.m4.large
and now its fine.Wow you use elasticache?
why not self hosted redis
Because its a cluster not a single instance
oh ic