Intermittent network (DNS?) errors

This morning I woke up to some strange network-related errors. I am running a small Node.js application, which pulls data from a Redis service in the same project once per minute. Based on the results of that Redis query, an HTTP API is queried (I use the Axios.js library for web requests). At 22:38:03 PST, I received the following error: AxiosError: connect EHOSTUNREACH 144.202.50.255:80. I have confirmed via third party and also via a separate application on a different service that the API was reachable at the indicated time. The application restarted, and I immediately received the following error repeatedly: Error: getaddrinfo ENOTFOUND redis.railway.internal at 22:38:11, 22:38:16, :21, :26, :32, :38, :47, 39:03, and 39:32 before the restart limit was reached. The application references the redis.railway.internal address via an environment variable, so it is clear the variable itself is working properly. The confluence of these two errors, targeting two separate services/addresses, beginning at the same time suggests a potential issue (DNS?) on the backend at this time. I restarted the application at 07:16 this morning and the once-per-minute Redis call and external API calls are both proceeding without issue thus far.
Railway
Log Explorer
Railway is an infrastructure platform where you can provision infrastructure, develop with that infrastructure locally, and then deploy to the cloud.
Railway
Log Explorer
Railway is an infrastructure platform where you can provision infrastructure, develop with that infrastructure locally, and then deploy to the cloud.
52 Replies
Percy
Percy6mo ago
Project ID: aabf52ba-51a4-493c-a24b-f2354c48f9bd,f21a6a59-54c0-4b4b-aacc-81db87431cfe,c75e2871-fa01-49cf-9a5e-de8387ecf873,aabf52ba-51a4-493c-a24b-f2354c48f9bd,f21a6a59-54c0-4b4b-aacc-81db87431cfe,c75e2871-fa01-49cf-9a5e-de8387ecf873
VeritableQuandary
VeritableQuandaryOP6mo ago
Project ID: aabf52ba-51a4-493c-a24b-f2354c48f9bd Deployment ID: f21a6a59-54c0-4b4b-aacc-81db87431cfe As I was writing this, I received another EHOSTUNREACH error to the same IP address. I jinxed it, I guess!
Brody
Brody6mo ago
the EHOSTUNREACH error could be due to a blip in GCP's networking, but the DNS error for the internal redis domain is unfortunately all too common, yet it's fixable, let me ask some questions about that - nixpacks or Dockerfile? do you have a 3 second sleep before starting your app?
VeritableQuandary
VeritableQuandaryOP6mo ago
I'm using nixpacks (railway up) for deploying the code, and I don't have a sleep in my start command currently but I can certainly add one
Brody
Brody6mo ago
yes a 3 second sleep is definitely needed as the private network's DNS resolver is not available for the first ~3 seconds upon start
VeritableQuandary
VeritableQuandaryOP6mo ago
Perfect, I'll set that up now. Nothing to be done about the EHOSTUNREACH error I assume - I started noticing that pop up for the first time a week or so ago, which I think was when that GCP outage happened; I feel like it's gotten more common, but that's probably just hearsay :) Well... nothing to be done other than adding some more robust error handling to my little hobby project! Ha
Brody
Brody6mo ago
yep unfortunately railway is at the mercy of gcp until they finish their move to bare metal
VeritableQuandary
VeritableQuandaryOP6mo ago
Understood. Thanks very much for the quck response!
Brody
Brody6mo ago
no problem, let me know if that 3 second sleep works and make sure that you're using a readiness style health check so that railway doesn't switch over traffic while your service is sleeping for those 3 seconds
VeritableQuandary
VeritableQuandaryOP6mo ago
Three second sleep worked great @Brody , thanks for the tip. I've had one more weird error pop up today that's not making sense - could also be a GCP networking issue? I've now started getting an ETIMEDOUT error on my Redis connection. I'm making a call to redis at least once per minute, so I'm not seeing any reason on the code side why suddenly it's timing out. Error Log
Railway
Log Explorer
Railway is an infrastructure platform where you can provision infrastructure, develop with that infrastructure locally, and then deploy to the cloud.
VeritableQuandary
VeritableQuandaryOP6mo ago
Again, will implement some error handling to try re-opening the connection when I run into this, but I'm curious why it might be happening to begin with
Brody
Brody6mo ago
and you're getting that error from the redis database that you are connecting to via the private network?
VeritableQuandary
VeritableQuandaryOP6mo ago
Correct, yeah In this case, it was stable for 10 minutes before that error, so 9-10 successful per-minute checks
Brody
Brody6mo ago
is redis running into any resource limitations? disk, mem, etc
VeritableQuandary
VeritableQuandaryOP6mo ago
I'd be shocked if so, it has a whole one key in it currently XD
No description
Brody
Brody6mo ago
are you using ioredis?
VeritableQuandary
VeritableQuandaryOP6mo ago
I'm using node-redis currently Can definitely give ioredis a try though if that's a better option!
Brody
Brody6mo ago
it's got its own issues, are you using an older version of node-redis?
VeritableQuandary
VeritableQuandaryOP6mo ago
4.6.14, current version
Brody
Brody6mo ago
Then I'm currently stumped, would it be too much to ask you to switch to dragonfly just for a test instead?
gazhay
gazhay6mo ago
I've been getting internal dns errors to local MySQL the last 2 days, they disappear as quickly as they appear (usually overnight uk tine) Sometimes error is silent - loss of front end but nothing in logs until a restart then the errors appear but resolve after minutes.
Brody
Brody6mo ago
please share the specific error
gazhay
gazhay6mo ago
loss of front end but nothing in logs until a restart
loss of front end but nothing in logs until a restart
Brody
Brody6mo ago
without errors there's not too much we can help with
Poui
Poui6mo ago
I also had similiar errors, I can copy paste my erros logs here if needed
Brody
Brody6mo ago
yes please
Poui
Poui6mo ago
MySQL DB Error :
Brody
Brody6mo ago
is the database accessible locally
Poui
Poui6mo ago
Any API Error at the same time : Exception in thread "Timer-0" java.net.NoRouteToHostException: No route to host at java.base/sun.nio.ch.SocketChannelImpl.checkConnect(Native Method) at java.base/sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:777) at io.ktor.network.sockets.SocketImpl.connect$ktor_network(SocketImpl.kt:50) at io.ktor.network.sockets.SocketImpl$connect$1.invokeSuspend(SocketImpl.kt) at kotlin.coroutines.jvm.internal.BaseContinuationImpl.resumeWith(ContinuationImpl.kt:33) at kotlinx.coroutines.DispatchedTask.run(DispatchedTask.kt:106) at kotlinx.coroutines.internal.LimitedDispatcher$Worker.run(LimitedDispatcher.kt:115) at kotlinx.coroutines.scheduling.TaskImpl.run(Tasks.kt:100) at kotlinx.coroutines.scheduling.CoroutineScheduler.runSafely(CoroutineScheduler.kt:584) at kotlinx.coroutines.scheduling.CoroutineScheduler$Worker.executeTask(CoroutineScheduler.kt:793) at kotlinx.coroutines.scheduling.CoroutineScheduler$Worker.runWorker(CoroutineScheduler.kt:697) at kotlinx.coroutines.scheduling.CoroutineScheduler$Worker.run(CoroutineScheduler.kt:684) I changed the server and it's now accessible locally I went from USA to EUW
Brody
Brody6mo ago
oh you're pro?
Poui
Poui6mo ago
Yes
Brody
Brody6mo ago
fixed your badges
Poui
Poui6mo ago
i tried a couple of times but it never changed
Brody
Brody6mo ago
are you getting this error intermittently or only during the start of your app
Poui
Poui6mo ago
always at the start of the app if i use the local url for the db, intermittently with the public URL
gazhay
gazhay6mo ago
You have to wait a few seconds for local DNS to come up sometimes
Brody
Brody6mo ago
can you try connecting through the private network and add a 3 second sleep before starting your app
Poui
Poui6mo ago
I will switch back to the US West (Oregon, USA) to retry Same error without the 3 second sleep in US West (Oregon, USA), the build with the sleep time is still deploying Okey it worked with the 3 second sleep
Brody
Brody6mo ago
perfect
Poui
Poui6mo ago
It didn't need the 3 second sleep in EUW tho
Brody
Brody6mo ago
you need the 3 second sleep everywhere, even if you don't immediately see an issue
Poui
Poui6mo ago
I removed the auto request retry with the NoRouteToHostException so I can have the alert and report back to you if I have that error again
Brody
Brody6mo ago
sounds good
Poui
Poui6mo ago
The error is still occuring with API calls : Exception in thread "Timer-2" java.net.NoRouteToHostException: No route to host at java.base/sun.nio.ch.SocketChannelImpl.checkConnect(Native Method) at java.base/sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:777) at io.ktor.network.sockets.SocketImpl.connect$ktor_network(SocketImpl.kt:50) at io.ktor.network.sockets.SocketImpl$connect$1.invokeSuspend(SocketImpl.kt) at kotlin.coroutines.jvm.internal.BaseContinuationImpl.resumeWith(ContinuationImpl.kt:33) at kotlinx.coroutines.DispatchedTask.run(DispatchedTask.kt:106) at kotlinx.coroutines.internal.LimitedDispatcher$Worker.run(LimitedDispatcher.kt:115) at kotlinx.coroutines.scheduling.TaskImpl.run(Tasks.kt:100) at kotlinx.coroutines.scheduling.CoroutineScheduler.runSafely(CoroutineScheduler.kt:584) at kotlinx.coroutines.scheduling.CoroutineScheduler$Worker.executeTask(CoroutineScheduler.kt:793) at kotlinx.coroutines.scheduling.CoroutineScheduler$Worker.runWorker(CoroutineScheduler.kt:697) at kotlinx.coroutines.scheduling.CoroutineScheduler$Worker.run(CoroutineScheduler.kt:684)
Brody
Brody6mo ago
what domain are you getting that error from
Poui
Poui6mo ago
https://www.furious-squad.com/ but before I use the private link, both the db communication and the api call were throwing errors
Brody
Brody6mo ago
you get errors when you try to call that domain?
Poui
Poui6mo ago
I never had that specific error before
Brody
Brody6mo ago
please answer the question to the best of your ability
Poui
Poui6mo ago
No, except that one
Brody
Brody6mo ago
I'm sorry but that doesn't answer the question
Poui
Poui6mo ago
I didn't understand your question then I keep having this error once or twice per hour HttpClient: REQUEST https://mprez.furious-squad.com/api/v2/project/?query= failed with exception: java.net.NoRouteToHostException: No route to host Exception in thread "Timer-4" java.net.NoRouteToHostException: No route to host at java.base/sun.nio.ch.SocketChannelImpl.checkConnect(Native Method) at java.base/sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:777) at io.ktor.network.sockets.SocketImpl.connect$ktor_network(SocketImpl.kt:50) at io.ktor.network.sockets.SocketImpl$connect$1.invokeSuspend(SocketImpl.kt) at kotlin.coroutines.jvm.internal.BaseContinuationImpl.resumeWith(ContinuationImpl.kt:33) at kotlinx.coroutines.DispatchedTask.run(DispatchedTask.kt:106) at kotlinx.coroutines.internal.LimitedDispatcher$Worker.run(LimitedDispatcher.kt:115) at kotlinx.coroutines.scheduling.TaskImpl.run(Tasks.kt:100) at kotlinx.coroutines.scheduling.CoroutineScheduler.runSafely(CoroutineScheduler.kt:584) at kotlinx.coroutines.scheduling.CoroutineScheduler$Worker.executeTask(CoroutineScheduler.kt:793) at kotlinx.coroutines.scheduling.CoroutineScheduler$Worker.runWorker(CoroutineScheduler.kt:697) at kotlinx.coroutines.scheduling.CoroutineScheduler$Worker.run(CoroutineScheduler.kt:684)
Want results from more Discord servers?
Add your server