R
Railway•12mo ago
Eddy

Losing connection to redis after migration

Hello, i recently migrated to the new redis instance and have been getting errors randomly once a day and need to restart my container for it to work again. Anyone had similar issues after migrating?
[ioredis] Unhandled error event: Error: read ECONNRESET
[ioredis] Unhandled error event: Error: read ECONNRESET
55 Replies
Eddy
EddyOP•12mo ago
c256fc68-1d5c-4c39-8e86-a964d7ff66f5
Brody
Brody•12mo ago
@matt - connection reset redis
arus
arus•12mo ago
This happens with the new postgres container as well.
matt
matt•12mo ago
@arus can you share more details? And is there a reliable wat to reproduce the issue? ty!
Eddy
EddyOP•12mo ago
Could you find any reason why it happens in my project @matt ?
angelo
angelo•12mo ago
Hey there @Eddy - we had an issue with an #🚨|incidents, the Railway team is working on a post-mortem.
arus
arus•12mo ago
Only way to reproduce it is to wait ~3-8 hours without anyone calling one of the endpoints. Then I get an EOF detected error. I have to restart the bot container to reconnect to pg again. Hang on, I'll grab the full error. Lol and that isn't even fully true, it's been up 8 hours and I can't reproduce it yet. Command raised an exception: OperationalError: (psycopg2.OperationalError) SSL SYSCALL error: EOF detected (Background on this error at: https://sqlalche.me/e/14/e3q8) Already followed the guidance from sqlalchemy, it's still happening. It would sometimes happen after a month or two with the old postgres container, but now it's like 1-5 times a day.
Brody
Brody•12mo ago
are you making sure to close idle connections? seen this a lot where postgres would mark the connection as closed but the client doesn't know the connection was closed
arus
arus•12mo ago
Yep
Brody
Brody•12mo ago
well 8 hours so far is good, if there errors again let us know
arus
arus•12mo ago
Seems to have been stable overnight, hopefully whatever happened yesterday fixed it (I'm in the West region mentioned in incidents)
angelo
angelo•12mo ago
Yep it was likely the outage
Eddy
EddyOP•12mo ago
Yeah same here, happened 2 nights in a row and tonight it was stable
arus
arus•12mo ago
Sounds about right. Also lol that av Angelo. I haven't seen that frog in years. All seems fine now yeah. Down again @Angelo
arus
arus•12mo ago
No description
angelo
angelo•12mo ago
Hmm- are you closing your connections?
arus
arus•12mo ago
Yes. It happens after appx 20 hours now rather than 1-6 hours. Only started happening after I migrated to the new container. And only if it's idle the entire time. The bot container is not set to sleep but sometimes seems to anyway. Resource allocation in my region maybe?
arus
arus•12mo ago
This is the specific error I get on the client side. https://docs.sqlalchemy.org/en/14/core/pooling.html#pool-disconnects I am using a pessimistic method to recover. Looking at my logs, the main loop seems to be restarting while the container is running sometimes. The other behavior I notice if I try and pull before the EOF message are extremely delayed server responses in the region, when building/restarting especially. I'm going to give null pools a try again though and I'll let you know.
No description
Brody
Brody•12mo ago
im still thinking that your problem is related to keeping stale connections around, this problem its mentioned in the knexjs docs. its a javascript package, but the same can apply for any pooled postgres client within a docker environment. https://knexjs.org/guide/#pool
It can result in problems with stale connections
arus
arus•12mo ago
I'll take a look.
arus
arus•12mo ago
Yeah, I'm following this guidence which won't use a connection without checking it first. https://stackoverflow.com/a/66360789
Stack Overflow
psycopg2.OperationalError: SSL SYSCALL error: EOF detected on Flask...
I have an app that was written with Flask+SQLALchemy+Celery, RabbitMQ as a broker, database is PostgreSQL (PostgreSQL 10.11 (Ubuntu 10.11-1.pgdg16.04+1) on x86_64-pc-linux-gnu, compiled by gcc (Ubu...
arus
arus•12mo ago
Given this other article though, it does align with the lag theory. The lag could be exceeding the keep alive.
Brody
Brody•12mo ago
sorry do you mean youre going to use pool_pre_ping=True going forward, or have you already have been using it?
arus
arus•12mo ago
I have been already.
Brody
Brody•12mo ago
are you using the private url?
arus
arus•12mo ago
https://stackoverflow.com/a/66515677 I don't recall. Let me check.
Stack Overflow
Postgres SSL SYSCALL error: EOF detected with python and psycopg
Using psycopg2 package with python 2.7 I keep getting the titled error: psycopg2.DatabaseError: SSL SYSCALL error: EOF detected It only occurs when I add a WHERE column LIKE ''%X%'' clause to my
arus
arus•12mo ago
roundhouse.proxy.rlwy.net Same variable as before, but the migration populated a lot of it I can try the private url
Brody
Brody•12mo ago
can you try using the DATABASE_PRIVATE_URL variable
arus
arus•12mo ago
Yeah, let me switch off my phone. Alright, deploying. I'll let you know if it stays connected.
Brody
Brody•12mo ago
sounds good
arus
arus•12mo ago
Alright, it won't connect to the private url. Says the hostname can't be found.
Brody
Brody•12mo ago
building with nixpacks?
arus
arus•12mo ago
could not translate host name "postgres.railway.internal" to address: Name or service not known yes
Brody
Brody•12mo ago
can you try adding a 3 second sleep to the beginning of your start command?
arus
arus•12mo ago
Yeah one sec. nope
Brody
Brody•12mo ago
postgres is in the same project right?
arus
arus•12mo ago
Yes I'm going to try adding sleep to my cog setup functions. Didn't work either my nixbuild runs this. docker run -it us-west1.registry.rlwy.net/ let me check if postgres is in the same region. Bleh, yes, I'm hobby plan too So I couldn't change it if I wanted to
Brody
Brody•12mo ago
does the dns lookup that SQLAlchemy does support ipv6?
arus
arus•12mo ago
Pretty sure it does. Let me check this version real quick. yes, it does.
Brody
Brody•12mo ago
does the start command in the build table at the top of the build logs confirm that there is a sleep 3?
arus
arus•12mo ago
No, but that code is wrapped in a script.
Brody
Brody•12mo ago
can you change your start command to sleep 3 && <your current start command>
arus
arus•12mo ago
yeah one sec. okay, looks like it didn't explode this time.
Brody
Brody•12mo ago
make sure you are using a healthcheck now though https://docs.railway.app/guides/healthchecks-and-restarts
arus
arus•12mo ago
Not entirely sure how I'm going to do that just yet, but i'll look into it.
Brody
Brody•12mo ago
do you already have a web framework in place or is this a bot app?
arus
arus•12mo ago
Bot
Brody
Brody•12mo ago
ah then dont worry about the health check
arus
arus•12mo ago
I'm thinking of adding some short lived auth endpoints to connect users to their own content though, so I'll probably throw it in when I do that. Alright, gonna let this sucker idle for a couple days and see if the errors are done.
Brody
Brody•12mo ago
yep that would be the time to add a healthcheck sounds good
arus
arus•12mo ago
Thanks! Down again.
arus
arus•12mo ago
No description
arus
arus•12mo ago
I checked, the client reported connecting but not receiving a response.
Eddy
EddyOP•12mo ago
Same here
1:M 11 Dec 2023 16:24:55.577 # Possible SECURITY ATTACK detected. It looks like somebody is sending POST or Host: commands to Redis. This is likely due to an attacker attempting to use Cross Protocol Scripting to compromise your Redis instance. Connection from 192.168.16.4:***** aborted.
1:M 11 Dec 2023 16:24:55.577 # Possible SECURITY ATTACK detected. It looks like somebody is sending POST or Host: commands to Redis. This is likely due to an attacker attempting to use Cross Protocol Scripting to compromise your Redis instance. Connection from 192.168.16.4:***** aborted.
devon
devon•10mo ago
This is still happening for me, crashes every roughly 4 days. Did anyone figure out a clear resolution. Otherwise I'm going to need to migrate away from Railway entirely as this is not stable for production. Note I have healthchecks and everything.
Want results from more Discord servers?
Add your server