R
Railwayβ€’6mo ago
anhedonia

dial tcp: connect: no route to host

I am seeing a lot of these errors pop up randomly. sometimes my instance just can't reach a host for some reason. I tried to rebuild and disable some proxies the app was using. But things are still breaking intermittently. Is there a way to check if this is a platform issue or something else?
90 Replies
Percy
Percyβ€’6mo ago
Project ID: 30eb2ca4-aa10-44f8-b252-96cf312185bd
anhedonia
anhedoniaOPβ€’6mo ago
30eb2ca4-aa10-44f8-b252-96cf312185bd hosts the app is trying to reach
api.wotblitz.eu
api.wotblitz.com
api.wotblitz.asia
api.wotblitz.eu
api.wotblitz.com
api.wotblitz.asia
Brody
Brodyβ€’6mo ago
do you happen to have static IPs enabled?
anhedonia
anhedoniaOPβ€’6mo ago
I am not sure, so I am assuming I do not. At least it's not something I ever looked into/needed
Brody
Brodyβ€’6mo ago
can you check in your service settings just to be sure please?
anhedonia
anhedoniaOPβ€’6mo ago
Ah it's a Pro feature - I do not
Brody
Brodyβ€’6mo ago
thanks, discord says you have a Pro badge so I thought id ask raising your thread directly to the team now an incident was just called #🚨|incidents
anhedonia
anhedoniaOPβ€’6mo ago
Here is a full log as well
get "https://api.wotblitz.com/wotb/account/list/?application id=...": dial tcp 92.223.56.106:443: connect: no route to host
get "https://api.wotblitz.com/wotb/account/list/?application id=...": dial tcp 92.223.56.106:443: connect: no route to host
The IPs are different, but it has been randomly happening for around a week now
Brody
Brodyβ€’6mo ago
you may not have static IPs enabled but static IPs would still be shared, so you could have ended up with a service being given the same IP
anhedonia
anhedoniaOPβ€’6mo ago
I don't think I have enough knowledge on networking :D My setup is running 1 replica at a time that is trying to reach this, though some requests are made in parallel. The services I am trying to reach are from a 3rd party as well. Is there a way for static ip to cause an issue here? Not sure how relevant this is, but same errors occured when I had an http proxy enabled for those request
Brody
Brodyβ€’6mo ago
well it doesn't explain why you have had issues for the past week, the incident description says it started today but we can reassess this after the incident has been resolved
anhedonia
anhedoniaOPβ€’6mo ago
Here is the first log I am able to find on my end
β€” 05/29/2024 11:34 PM
get "https://api.wotblitz.com/wotb/account/info/?account id=1058430648&application id=...": proxyconnect tcp: dial tcp 172.96.83.75:4444: connect: no route to host
β€” 05/29/2024 11:34 PM
get "https://api.wotblitz.com/wotb/account/info/?account id=1058430648&application id=...": proxyconnect tcp: dial tcp 172.96.83.75:4444: connect: no route to host
Brody
Brodyβ€’6mo ago
interesting, that's from quite a while ago
anhedonia
anhedoniaOPβ€’6mo ago
yeah, I though this is an issue with the 3rd party service as their servers sometimes error out, so I did not report and tried to debug on my own
Brody
Brodyβ€’6mo ago
gotcha, we will come back to this once the team has confirmed the incident has been resolved the incident has been marked as resolved, let me know if you continue to see this issue
anhedonia
anhedoniaOPβ€’6mo ago
β€” Today at 9:20 PM
dial tcp 92.223.17.55:443: connect: no route to host
β€” Today at 9:20 PM
dial tcp 92.223.17.55:443: connect: no route to host
β€” Today at 9:38 PM
dial tcp 92.223.7.145:443: connect: no route to host
β€” Today at 9:38 PM
dial tcp 92.223.7.145:443: connect: no route to host
β€” Today at 11:10 PM
dial tcp 92.223.17.55:443: connect: no route to host
β€” Today at 11:10 PM
dial tcp 92.223.17.55:443: connect: no route to host
Still happening for me :sadge:
Brody
Brodyβ€’6mo ago
are these the only domains you're having issues calling?
anhedonia
anhedoniaOPβ€’6mo ago
yeah, I am not calling anything else Other calls are quite rare, but I haven’t seen them fail a lot more requests failed overnight as well
Brody
Brodyβ€’6mo ago
what region are you deployed to?
anhedonia
anhedoniaOPβ€’6mo ago
US West, I downgraded from PRO, so I am not able to adjust that just noticed I am on Legacy runtime as well, gonna swap to V2 just in case it matters logs are not loading fully now, but it's not a big deal for me
Brody
Brodyβ€’6mo ago
you're the second person I've seen report an issue with the logs, can I ask what logger you are using?
anhedonia
anhedoniaOPβ€’6mo ago
https://github.com/rs/zerolog default settings across the board I also have multiple services using the same logger, some of them seem to be logging fine-ish; the message is not being shown, but logs are there
Brody
Brodyβ€’6mo ago
what do you see if you expand the context of a blank log?
anhedonia
anhedoniaOPβ€’6mo ago
expanded view is all good, but the logs are blank
No description
Brody
Brodyβ€’6mo ago
I have an idea of what's happening, will test
anhedonia
anhedoniaOPβ€’6mo ago
but sometimes it's fine, this one message came through
No description
Brody
Brodyβ€’6mo ago
logs that aren't json are fine, but your json logs are blank
anhedonia
anhedoniaOPβ€’6mo ago
yeah, but this log above should be around 5 lines and only 1 is visible
anhedonia
anhedoniaOPβ€’6mo ago
It would looks like this on V1
No description
Brody
Brodyβ€’6mo ago
okay ill see if i can reproduce with fiber
anhedonia
anhedoniaOPβ€’6mo ago
it looks like there are now a lot of logs during startup, like container started and etc. seems like this prevents some of the service logs from being delivered right after container start. at least I think I saw some new logs right as the service started, they are all gone now :D
Brody
Brodyβ€’6mo ago
interesting can you share your logger middleware config?
anhedonia
anhedoniaOPβ€’6mo ago
import "github.com/gofiber/contrib/fiberzerolog"
...
fiber.New(fiber.Config{Network: "tcp"})
app.Use(fiberzerolog.New())
import "github.com/gofiber/contrib/fiberzerolog"
...
fiber.New(fiber.Config{Network: "tcp"})
app.Use(fiberzerolog.New())
Brody
Brodyβ€’6mo ago
thanks!
Brody
Brodyβ€’6mo ago
can reproduce
No description
Brody
Brodyβ€’6mo ago
ill talk to the team about this monday
anhedonia
anhedoniaOPβ€’6mo ago
Just got logs working, at least text ones on one replica e65c43cb-34b6-4519-8a01-6a13cdf03732 ah, this service loads a file right before start, so it might just be that it took a little longer to start before logging. it also has a volume attached json logs also worked there tho
Brody
Brodyβ€’6mo ago
is it using the v2 runtime?
anhedonia
anhedoniaOPβ€’6mo ago
yes, just switched earlier as well
Brody
Brodyβ€’6mo ago
your logs are no longer blank?
anhedonia
anhedoniaOPβ€’6mo ago
just in this one service, the first start after Legacy > V2 switch also logged correctly it seems 0e7a89a8-f577-4121-9a22-bc187fb0eeef
Brody
Brodyβ€’6mo ago
you can get the logs from the middeware back by doing this -
logger := zerolog.New(os.Stdout).Hook(zerolog.HookFunc(func(e *zerolog.Event, level zerolog.Level, message string) {
e.Str("msg", message)
}))

app.Use(fiberzerolog.New(fiberzerolog.Config{
Logger: &logger,
}))
logger := zerolog.New(os.Stdout).Hook(zerolog.HookFunc(func(e *zerolog.Event, level zerolog.Level, message string) {
e.Str("msg", message)
}))

app.Use(fiberzerolog.New(fiberzerolog.Config{
Logger: &logger,
}))
zerolog used the message attribute, but it looks like the runtime v2 is only picking up msg
Brody
Brodyβ€’6mo ago
doesnt fix the fiber printout, but its progress
No description
angelo
angeloβ€’6mo ago
We can do now :0
Brody
Brodyβ€’6mo ago
was the message attribute not being picked up a known issue prior to this?
anhedonia
anhedoniaOPβ€’6mo ago
but another service works with the same config :Peepo_Think:
Brody
Brodyβ€’6mo ago
the other service is likely still on the legacy runtime
anhedonia
anhedoniaOPβ€’6mo ago
level, _ := zerolog.ParseLevel(os.Getenv("LOG_LEVEL"))
zerolog.SetGlobalLevel(level)

app := fiber.New(fiber.Config{
Network: os.Getenv("NETWORK"),
})
app.Use(fiberzerolog.New())
level, _ := zerolog.ParseLevel(os.Getenv("LOG_LEVEL"))
zerolog.SetGlobalLevel(level)

app := fiber.New(fiber.Config{
Network: os.Getenv("NETWORK"),
})
app.Use(fiberzerolog.New())
No description
anhedonia
anhedoniaOPβ€’6mo ago
ah, makes sense
Brody
Brodyβ€’6mo ago
yeah this is a bug in how logs are picked up from the v2 runtime
anhedonia
anhedoniaOPβ€’6mo ago
got it. it's more or less a non-issue for me, so all good
Brody
Brodyβ€’6mo ago
you can switch back to the legacy runtime in the service settings, or use my proposed temporary solution above
anhedonia
anhedoniaOPβ€’6mo ago
the tcp errors have not yet popped up since V2 upgrade, but they happen in small bursts every few hours
Brody
Brodyβ€’6mo ago
tbh I'm thinking it may be an issue with the service you are calling not railway, but we will wait and see what happens
anhedonia
anhedoniaOPβ€’6mo ago
i'll just stay on V2, it's gotta be better in some ways right? at least the number is higher than V1 πŸ˜‚
Brody
Brodyβ€’6mo ago
so true 🀣
anhedonia
anhedoniaOPβ€’6mo ago
yeah, I have a suspicion as well, but idk how to test that because it is so random I'll also ask someone who uses this api on another project, maybe they have some similar issues
Brody
Brodyβ€’6mo ago
is theirs hosted on railway?
anhedonia
anhedoniaOPβ€’6mo ago
nope, on GitHub Actions πŸ˜‚
Brody
Brodyβ€’6mo ago
ah gotcha
anhedonia
anhedoniaOPβ€’6mo ago
it's also TS, so idk if I can even compare. but i'd imagine network issues like that would pop up anywhere they got back to me - no similar errors :sadge:
Brody
Brodyβ€’6mo ago
you would hope, wouldn't be too good if this was a "only happens on railway" type of issue but you said you still get this error connecting through a proxy, thus taking railways networking completely out of the question
anhedonia
anhedoniaOPβ€’6mo ago
so far it is, but I am also working on a refactor that will move to fly.io. I should start using that api heavily next week, so I will be able to compare better yeah
Brody
Brodyβ€’6mo ago
oh that's not ideal, may I ask why you are moving to fly?
anhedonia
anhedoniaOPβ€’6mo ago
1. MongoDB is really hard to limit in memory when the container has a technical limit of 8GB, so it end up using a lot of RAM, even when I set limits through command args 2. #1 led me to explore other options and I decided to try SQLite, but Railway has no way to make custom snapshots/recover the volume data. I think fly does snapshots automatically and I can restore a volume from it super quick 3. 5GB volume size limit since my apps consume almost no CPU and only a couple hundred mbs of RAM, Pro plan is just not worth it but I gotta say, Railway CPUs are noticeably faster :D
Brody
Brodyβ€’6mo ago
1. How does it work on fly? I'm not too familiar, can you choose your instance size? and what if mongo does need more memory, wouldn't it just crash on fly? 2. that's fair 3. also fair
anhedonia
anhedoniaOPβ€’6mo ago
Mongo resizes their in-memory cache based on available system meory, I can pick a 1GB instance on fly and it will just work with what it has, no matter how big the collections get. On Railway, I had to reduce the amount of data stored in order to keep Mongo around 1GB, cutting some features. It worked for now, but I am just really worried that it will still grow as the dataset gets bigger since I really have no control over it.
Brody
Brodyβ€’6mo ago
Okay gotcha, it works since mongo plays nicely with the available memory all very good feedback, I will make sure the team sees this
anhedonia
anhedoniaOPβ€’6mo ago
yeah, I never had any issues with limiting mongo to some specific size. I believe you can also go below 1GB, it's just not recommended. So I could get an even smaller instance on fly if it's still expensive for what the project is but yeah, it would be amazing if there was some way to get the files from the volume or at least make snapshots and copy/restore
Brody
Brodyβ€’6mo ago
I think the main reason railway hasn't allowed for custom lower instance sizing would be because the majority of services will crash volume snapshots are something they would like to do and will do at some point
anhedonia
anhedoniaOPβ€’6mo ago
well, railway dynamic pricing based on usage is amazing for Go, it is practically free to run most of my projects :D while on other providers I would have to pay for some minimum sized instance, like 256mb. but it is also quite scary to know that something can happen or I can get spammed and there is no way to limit the damage, like set a RAM/CPU limit
Brody
Brodyβ€’6mo ago
I've seen thousands of help threads for crashed services that people tried to run on the trial plan with 500mb of memory
anhedonia
anhedoniaOPβ€’6mo ago
yeah, cannot relate πŸ˜‚
No description
Brody
Brodyβ€’6mo ago
you can set a usage limit but it's not quite the same haha I know I'm a go dev too
anhedonia
anhedoniaOPβ€’6mo ago
yeah, service/container limits would be awesome. and a way to run migrations :D I had to bloat my final image size quite a bit to run migrations on SQLite, would be nice to have the ability to interact with the volume before the main container starts up
Brody
Brodyβ€’6mo ago
i wonder if the v2 builder (different from the v2 runtime) allows for interacting with the service's volume during build side note, you could also use libsql in place of sqlite?
anhedonia
anhedoniaOPβ€’6mo ago
yeah, but that would not solve migrations :Peepo_Think: plus I am kinda moving away from sharding everything into microservices πŸ˜‚ so it's nice to just deploy a single binary
Brody
Brodyβ€’6mo ago
if you used libsql daemon you could run your migrations during build thats also fair
anhedonia
anhedoniaOPβ€’6mo ago
ah that's a good point but then it might break prod πŸ˜‚
Brody
Brodyβ€’6mo ago
ah good point
anhedonia
anhedoniaOPβ€’6mo ago
well, I guess it's not too different from how I do it now. so it might be a solid option actually I'll give it a try when some volume features get through the pipeline, rn I don't need anything more than sqlite, and I want to make the project super friendly to self host
Brody
Brodyβ€’6mo ago
sounds good!
anhedonia
anhedoniaOPβ€’6mo ago
So far not a single error since I switched to V2 around 7 hours ago, so it might be solved 🀞
Brody
Brodyβ€’6mo ago
how do you know if there are errors if you can't see the logs lol
anhedonia
anhedoniaOPβ€’6mo ago
it pings me on Discord :D I can also see the logs if I expand them
Brody
Brodyβ€’6mo ago
well that's good news you would have gotten errors within 7 hours on the legacy runtime?
anhedonia
anhedoniaOPβ€’6mo ago
yeah, they popped up every 2-5 hours it seems
Brody
Brodyβ€’6mo ago
awesome, the logs are easy to fix but the networking problem likely isn't so I'm happy runtime V2 fixed it
anhedonia
anhedoniaOPβ€’6mo ago
0 errors overnight, so this is definitely fixed now. weird that V1 had some obscure networking issue, time to move to V2 everywhere just in case :D goodbye logs 🫑
Brody
Brodyβ€’5mo ago
I'm sure the team will fix the logs fast the team is hands on keyboard to fix the logging issue update: one half of this problem has been fixed, structured logs with a message attribute will not be blank anymore, the missing logs are still being worked on update: the missing logs are fixed, but theres a new issue that arose with the possibility of them being not shown in the correct order sorry for the late reply, but all known logging issues on the v2 runtime have been fixed
Want results from more Discord servers?
Add your server