R
Railway5mo ago
pauldps

V2 + App Sleep = first response always empty

I recently deployed a new Bun API on V2 with App Sleep, and I've noticed that the first request to a sleeping app always returns an empty response. This hasn't happened on non-V2 Bun apps with App Sleep on. The following are tests with curl using the same URL/endpoint. Normal request (non-sleeping app, {"status": "OK"} is the response from my API):
HTTP/2 200
content-type: application/json;charset=utf-8
date: Thu, 04 Jul 2024 06:03:09 GMT
server: railway-edge
x-request-id: 56xHFuw3QlCDX-2Zclvhkw_3165824431
content-length: 15

{"status":"OK"}
real 0m0.269s
user 0m0.016s
sys 0m0.000s
HTTP/2 200
content-type: application/json;charset=utf-8
date: Thu, 04 Jul 2024 06:03:09 GMT
server: railway-edge
x-request-id: 56xHFuw3QlCDX-2Zclvhkw_3165824431
content-length: 15

{"status":"OK"}
real 0m0.269s
user 0m0.016s
sys 0m0.000s
First request on the same app but sleeping:
HTTP/2 200
server: railway-edge
x-request-id: Z4G6GgaAQziEbf20vOO_UQ_3165824431
content-length: 0
date: Thu, 04 Jul 2024 06:02:56 GMT


real 0m1.275s
user 0m0.000s
sys 0m0.000s
HTTP/2 200
server: railway-edge
x-request-id: Z4G6GgaAQziEbf20vOO_UQ_3165824431
content-length: 0
date: Thu, 04 Jul 2024 06:02:56 GMT


real 0m1.275s
user 0m0.000s
sys 0m0.000s
Project ID: 34304961-2ebf-4d0b-b2ae-3585cf6b9353
200 Replies
Percy
Percy5mo ago
Project ID: 34304961-2ebf-4d0b-b2ae-3585cf6b9353
Brody
Brody5mo ago
can you also provide the same data for the same app running on the legacy runtime
pauldps
pauldpsOP5mo ago
you mean change the runtime for that app, right? I tested a different app running on Legacy and the issue didn't happen but I'll change the runtime
Brody
Brody5mo ago
testing a different app is not conclusive, when testing you need to change only one variable at a time, a completely different app changes too many variables
pauldps
pauldpsOP5mo ago
it was another Bun app, but I can see the variables I'm deploying the reported app on Legacy and have to wait for it to sleep 🙂
Brody
Brody5mo ago
I'm not talking about environment variables
pauldps
pauldpsOP5mo ago
yeah I meant variables as not in environment variables but how the apps are different despite both being Bun apis deploy is done, will report back in about 10 mins Test done, request on sleeping app worked fine:
HTTP/2 200
content-type: application/json;charset=utf-8
date: Thu, 04 Jul 2024 06:34:47 GMT
server: railway-edge
x-request-id: gLwN8r6fSpSLChJuzIS30g_3165824431
content-length: 15

{"status":"OK"}
real 0m1.795s
user 0m0.000s
sys 0m0.016s
HTTP/2 200
content-type: application/json;charset=utf-8
date: Thu, 04 Jul 2024 06:34:47 GMT
server: railway-edge
x-request-id: gLwN8r6fSpSLChJuzIS30g_3165824431
content-length: 15

{"status":"OK"}
real 0m1.795s
user 0m0.000s
sys 0m0.016s
switching back to V2 to repeat the test btw that cold boot time = 🏆
Brody
Brody5mo ago
1.795s is good?
pauldps
pauldpsOP5mo ago
for a cold boot time? I'd say excellent I have another Rails app running on Railway that cold-boots in about 10s, kinda bad but that's mostly Rails to blame
Brody
Brody5mo ago
similarly I have a feeling this is bun to blame
pauldps
pauldpsOP5mo ago
I'm running a compiled executable, so very likely
Brody
Brody5mo ago
isn't that the recommended way to run in production though
pauldps
pauldpsOP5mo ago
it is, I'm following their guide on that for comparison, this is normal request time
HTTP/2 200
content-type: application/json;charset=utf-8
date: Thu, 04 Jul 2024 06:38:27 GMT
server: railway-edge
x-request-id: vAXlgIj9Rlq8binHs09mAQ_603524580
content-length: 15

{"status":"OK"}
real 0m0.221s
user 0m0.000s
sys 0m0.000s
HTTP/2 200
content-type: application/json;charset=utf-8
date: Thu, 04 Jul 2024 06:38:27 GMT
server: railway-edge
x-request-id: vAXlgIj9Rlq8binHs09mAQ_603524580
content-length: 15

{"status":"OK"}
real 0m0.221s
user 0m0.000s
sys 0m0.000s
the increased memory usage on V2 is a bit of a bummer though
Brody
Brody5mo ago
how much of an increase.
pauldps
pauldpsOP5mo ago
Legacy was running ~36MB V2 is running 53~61MB
Brody
Brody5mo ago
is this the exact same app, or are you comparing different apps again lol
pauldps
pauldpsOP5mo ago
it's the same app
Brody
Brody5mo ago
someone else reported higher memory usage on the v2 runtime, but I can't reproduce it just by purely allocating bytes
pauldps
pauldpsOP5mo ago
just by looking at its memory metrics looking further back (the app has been up only for a couple hours) the lowest it got on V2 was 42MB, but I only had one run of it on Legacy, so probably needs more data
No description
pauldps
pauldpsOP5mo ago
but the difference is quite noticeable, maybe not in the image because the chart ceiling is a bit too high btw the app went to sleep again and curl returned an empty response if it matters, the service has a volume
Brody
Brody5mo ago
remove the volume and try again?
pauldps
pauldpsOP5mo ago
I'll try, that might break the app though app broke, trying to fix it alright it's back up, now waiting for sleep
pauldps
pauldpsOP5mo ago
Network graph also wild on Legacy
No description
Brody
Brody5mo ago
then it's a good thing the legacy runtime will be phased out
pauldps
pauldpsOP5mo ago
got empty response on sleeping app, so volume is not it
Brody
Brody5mo ago
okay can you provide a minimal reproducible bun app that sends an empty response on the v2 runtime
pauldps
pauldpsOP5mo ago
I'll try I deployed a smaller app, and could not replicate the issue but here's the thing... the affected app also stops having the issue 👀
Brody
Brody5mo ago
uh.. task failed successfully?
pauldps
pauldpsOP5mo ago
ugh, technology these days 😛
Brody
Brody5mo ago
ugh, bun these days
pauldps
pauldpsOP5mo ago
now that I have both apps running, I'll try to replicate it again with the affected app then try to replicate it with the smaller app code btw: https://github.com/pauldps/bun-railway-v2-test I'll try deploying to separate project in case of same project shenanigans
Brody
Brody5mo ago
that's definitely minimal
pauldps
pauldpsOP5mo ago
I was able to reproduce the issue with the minimal app in a separate project and the original app also started showing blank responses after I removed the minimal app from that project 👀
Brody
Brody5mo ago
this is looking more like instabilities with bun try the same code with node?
pauldps
pauldpsOP5mo ago
added a node branch to the minimal app and deployed it, now waiting for sleep
Brody
Brody5mo ago
just a question, why do you have the healthcheck timeout set to a low value like 30 seconds?
pauldps
pauldpsOP5mo ago
because I want it to fail fast usually if the first request fails, the deploy is likely busted, and I don't want to wait 5 minutes for the deployment to fail
Brody
Brody5mo ago
makes sense
pauldps
pauldpsOP5mo ago
I think for Rails apps with slower boot times I set a higher value got the empty response with the Node app as well
Brody
Brody5mo ago
interesting can you link the applicable deployment
pauldps
pauldpsOP5mo ago
two requests
$ time curl -i https://bun-railway-v2-test-production.up.railway.app
HTTP/2 200
server: railway-edge
x-request-id: 3OBw1saOQ2WUAAJCR4VCGw_603524580
content-length: 0
date: Thu, 04 Jul 2024 08:17:05 GMT


real 0m1.322s
user 0m0.000s
sys 0m0.000s


$ time curl -i https://bun-railway-v2-test-production.up.railway.app
HTTP/2 200
content-type: application/json
date: Thu, 04 Jul 2024 08:17:29 GMT
server: railway-edge
x-request-id: 2SzMY7RyRVie9l27WpIy2Q_882434190
content-length: 19

{"status": "NODE"}

real 0m0.342s
user 0m0.000s
sys 0m0.000s
$ time curl -i https://bun-railway-v2-test-production.up.railway.app
HTTP/2 200
server: railway-edge
x-request-id: 3OBw1saOQ2WUAAJCR4VCGw_603524580
content-length: 0
date: Thu, 04 Jul 2024 08:17:05 GMT


real 0m1.322s
user 0m0.000s
sys 0m0.000s


$ time curl -i https://bun-railway-v2-test-production.up.railway.app
HTTP/2 200
content-type: application/json
date: Thu, 04 Jul 2024 08:17:29 GMT
server: railway-edge
x-request-id: 2SzMY7RyRVie9l27WpIy2Q_882434190
content-length: 19

{"status": "NODE"}

real 0m0.342s
user 0m0.000s
sys 0m0.000s
the deployment? or the project?
Brody
Brody5mo ago
the deployment
pauldps
pauldpsOP5mo ago
oh, got it 7368d15e-ed13-4684-aab3-72e2b3bdaa74
Brody
Brody5mo ago
full link please
Brody
Brody5mo ago
would it be too much to ask you to also do an express app?
pauldps
pauldpsOP5mo ago
lemme see if I can do it quickly, I never used express before lol
Brody
Brody5mo ago
that's a crazy sentence, I had never imagined someone who uses bun and Elysia to say they've never used express
pauldps
pauldpsOP5mo ago
when express was a thing I was mostly working with Rails when I moved to Node it was during a time where express was considered too slow compared to other libs, so I never touched it express app is up
$ time curl -i https://bun-railway-v2-test-production.up.railway.app
HTTP/2 200
content-type: application/json; charset=utf-8
date: Thu, 04 Jul 2024 08:33:43 GMT
etag: W/"14-kjLmVQInBma0jJMTEoZwvPwAyY4"
server: railway-edge
x-powered-by: Express
x-request-id: F32SRDmRQFSyDPyKYfb06w_603524580
content-length: 20

{"status":"EXPRESS"}
real 0m0.388s
user 0m0.000s
sys 0m0.000s
$ time curl -i https://bun-railway-v2-test-production.up.railway.app
HTTP/2 200
content-type: application/json; charset=utf-8
date: Thu, 04 Jul 2024 08:33:43 GMT
etag: W/"14-kjLmVQInBma0jJMTEoZwvPwAyY4"
server: railway-edge
x-powered-by: Express
x-request-id: F32SRDmRQFSyDPyKYfb06w_603524580
content-length: 20

{"status":"EXPRESS"}
real 0m0.388s
user 0m0.000s
sys 0m0.000s
waiting for sleep code in the express branch the express app responds correctly
Brody
Brody5mo ago
you're still on the v2 runtime?
pauldps
pauldpsOP5mo ago
yes, I just changed the branch and nothing else I wonder what's going on with Node's http server, which is probably what Bun servers are based off of using express is not an option for me though
Brody
Brody5mo ago
well that seems like this isn't an issue with railway then
pauldps
pauldpsOP5mo ago
hold on got an empty response with express I think my first test was too fast
$ time curl -i https://bun-railway-v2-test-production.up.railway.app
HTTP/2 200
server: railway-edge
x-request-id: xSTDdCMbTrexjKz3i8FsOg_1654200396
content-length: 0
date: Thu, 04 Jul 2024 08:54:27 GMT


real 0m1.298s
user 0m0.000s
sys 0m0.000s


$ time curl -i https://bun-railway-v2-test-production.up.railway.app
HTTP/2 200
content-type: application/json; charset=utf-8
date: Thu, 04 Jul 2024 08:54:53 GMT
etag: W/"14-kjLmVQInBma0jJMTEoZwvPwAyY4"
server: railway-edge
x-powered-by: Express
x-request-id: 7r720LsRTEyZ8daihSwTQg_1654200396
content-length: 20

{"status":"EXPRESS"}
real 0m0.270s
user 0m0.000s
sys 0m0.000s
$ time curl -i https://bun-railway-v2-test-production.up.railway.app
HTTP/2 200
server: railway-edge
x-request-id: xSTDdCMbTrexjKz3i8FsOg_1654200396
content-length: 0
date: Thu, 04 Jul 2024 08:54:27 GMT


real 0m1.298s
user 0m0.000s
sys 0m0.000s


$ time curl -i https://bun-railway-v2-test-production.up.railway.app
HTTP/2 200
content-type: application/json; charset=utf-8
date: Thu, 04 Jul 2024 08:54:53 GMT
etag: W/"14-kjLmVQInBma0jJMTEoZwvPwAyY4"
server: railway-edge
x-powered-by: Express
x-request-id: 7r720LsRTEyZ8daihSwTQg_1654200396
content-length: 20

{"status":"EXPRESS"}
real 0m0.270s
user 0m0.000s
sys 0m0.000s
it seems like an issue with V2 to me could potentially test with non-Javascript frameworks but that would be a bit too much for me to do atm
Brody
Brody5mo ago
ill test with a go server what happens if i dont experience the same issue?
pauldps
pauldpsOP5mo ago
I can test with Ruby later, but for now I need to go sleep myself lol good question, I have a theory, but want to test a slow language first remember to deploy in a new project since it seems multiple services in a project can affect the results, I'd like to test more about that part too
Brody
Brody5mo ago
i have indeed created a new project
pauldps
pauldpsOP5mo ago
I have deployed a Ruby/Sinatra app, and was not able to replicate the issue on the first cold boot. But I'm seeing a pattern in the logs that I want to investigate
pauldps
pauldpsOP5mo ago
these are the logs from the Express app. My first request did not trigger the problem, but my second did. The second request was after the "container event container died" log entry that was absent from the first request. So I'm trying to get that log entry to show on the Sinatra app
No description
pauldps
pauldpsOP5mo ago
the "Stopping Container" spam seems to indicate there's a problem somewhere with V2 was not able to replicate the issue with Ruby after 2 attempts. I'm going back to the main branch (Bun) to see if maybe the problem resolved itself
Brody
Brody5mo ago
stopping container is it being put to sleep
pauldps
pauldpsOP5mo ago
does it show even if the app is already sleeping? "Stopping Container" logs did not show up for the Ruby app 🤔 but it did go to sleep (according to the dashboard)
Brody
Brody5mo ago
maybe the ruby app is on the legacy runtime
pauldps
pauldpsOP5mo ago
I will doublecheck after I test one more time with the Bun branch
pauldps
pauldpsOP5mo ago
the problem is still there with the Bun app. The logs:
No description
pauldps
pauldpsOP5mo ago
no "Stopping Container" tho switching to the ruby branch for now to investigate more, made sure it's on V2 got to reproduce the issue with the Sinatra app. It was a little worse as two requests gave empty responses before the third one returned the correct response
$ time curl -i https://bun-railway-v2-test-production.up.railway.app
HTTP/2 200
server: railway-edge
x-request-id: OPED8JR1TW6SpSjny6blUg_882434190
content-length: 0
date: Thu, 04 Jul 2024 18:16:25 GMT


real 0m1.567s
user 0m0.016s
sys 0m0.000s



$ time curl -i https://bun-railway-v2-test-production.up.railway.app
HTTP/2 200
server: railway-edge
x-request-id: PbIYVlwJQ-qkE6uGJUtMsw_882434190
content-length: 0
date: Thu, 04 Jul 2024 18:16:29 GMT


real 0m0.211s
user 0m0.000s
sys 0m0.000s



$ time curl -i https://bun-railway-v2-test-production.up.railway.app
HTTP/2 200
content-type: application/json
date: Thu, 04 Jul 2024 18:16:35 GMT
server: railway-edge
server: WEBrick/1.8.1 (Ruby/3.2.4/2024-04-23)
x-content-type-options: nosniff
x-request-id: ZBY6eoJsS_qCym1oa23Yyg_3165824431
content-length: 20

{"status":"SINATRA"}
real 0m0.378s
user 0m0.000s
sys 0m0.000s
$ time curl -i https://bun-railway-v2-test-production.up.railway.app
HTTP/2 200
server: railway-edge
x-request-id: OPED8JR1TW6SpSjny6blUg_882434190
content-length: 0
date: Thu, 04 Jul 2024 18:16:25 GMT


real 0m1.567s
user 0m0.016s
sys 0m0.000s



$ time curl -i https://bun-railway-v2-test-production.up.railway.app
HTTP/2 200
server: railway-edge
x-request-id: PbIYVlwJQ-qkE6uGJUtMsw_882434190
content-length: 0
date: Thu, 04 Jul 2024 18:16:29 GMT


real 0m0.211s
user 0m0.000s
sys 0m0.000s



$ time curl -i https://bun-railway-v2-test-production.up.railway.app
HTTP/2 200
content-type: application/json
date: Thu, 04 Jul 2024 18:16:35 GMT
server: railway-edge
server: WEBrick/1.8.1 (Ruby/3.2.4/2024-04-23)
x-content-type-options: nosniff
x-request-id: ZBY6eoJsS_qCym1oa23Yyg_3165824431
content-length: 20

{"status":"SINATRA"}
real 0m0.378s
user 0m0.000s
sys 0m0.000s
pauldps
pauldpsOP5mo ago
logs
No description
pauldps
pauldpsOP5mo ago
(those are not errors btw, wtf Sinatra)
Brody
Brody5mo ago
printed to stderr I have requested my go app a few times when it has gone to sleep and was not able to get an empty response
pauldps
pauldpsOP5mo ago
were the logs like the above? I let my app sleep for about an hour or so before making requests from observation the problem seems to be related to those "contained died" and "stopping container" errors
Brody
Brody5mo ago
those are regular event logs, nothing to be concerned about
pauldps
pauldpsOP5mo ago
right, I meant logs, not errors 👍
Brody
Brody5mo ago
yeah the container log stuff is perfectly normal
pauldps
pauldpsOP5mo ago
I do think they seem to indicate the container is going into a state where it fails to render responses on wakeup so far V2 is the common denominator; I've changed projects and languages, and the problem doesn't happen on Legacy. What else could we try?
Brody
Brody5mo ago
not sure, I'll report it to the team anyway
pauldps
pauldpsOP5mo ago
thanks, I'll keep the project up if the team wants to debug/investigate
JustJake
JustJake5mo ago
So you can repro this on both bun and sinatra?
pauldps
pauldpsOP5mo ago
correct, also Node-http and express
JustJake
JustJake5mo ago
Ack and escalated It should be triaged on Monday
Brody
Brody5mo ago
fairly certain that the blank response should in fact be a 503 application failed to respond page, but railway is no longer sending that page at the moment due to what i believe to be a bug. so lets assume your first response to a sleeping service is a 503 status code, meaning your app did not respond to the first request in time, that explains why a statically compiled go app did not exhibit this behavior. when a request comes in for a slept app the container is started and a tcp connection attempt is done on a loop every 30ms, once that succeeds the request is forwarded to your app, but if your app is not ready to handle http traffic just yet you will get the 503 app failed to respond page, the apps health check is not taken into account. theres definitely some room for improvement here on the railway side of things for waking sleeping services aside from fixing the blank page being sent instead of 503.
JustJake
JustJake5mo ago
Is this for the new proxy or?
Brody
Brody5mo ago
yep all testing done with only the new proxy enabled
JustJake
JustJake5mo ago
Great. Miguel merged a fix for this. Should be good to go
Brody
Brody5mo ago
for clarity, the fix was for the blank response instead of the 503 application failed to respond page that should have been shown
pauldps
pauldpsOP5mo ago
I can handle the 503 response better than a blank page, can set my client to retry or something (although it would be nice if the 503 didn't happen)
JustJake
JustJake5mo ago
We now no longer return a 200 We should return a 500 as was the previous behavior
pauldps
pauldpsOP5mo ago
just ran a test now and got a 502 with a long HTML error page
JustJake
JustJake5mo ago
That's correct ye?
pauldps
pauldpsOP5mo ago
$ time curl -i https://bun-railway-v2-test-production.up.railway.app
HTTP/2 502
content-type: text/html
server: railway-edge
x-railway-fallback: true
x-request-id: Y6CCMwhzRo-9NqOjSW-vSw_882434190
content-length: 4689
date: Mon, 08 Jul 2024 18:19:26 GMT

... HTML ...
$ time curl -i https://bun-railway-v2-test-production.up.railway.app
HTTP/2 502
content-type: text/html
server: railway-edge
x-railway-fallback: true
x-request-id: Y6CCMwhzRo-9NqOjSW-vSw_882434190
content-length: 4689
date: Mon, 08 Jul 2024 18:19:26 GMT

... HTML ...
that's better than a 200 for sure
JustJake
JustJake5mo ago
What's "Best"
pauldps
pauldpsOP5mo ago
Best would be 200 with my app returning the correct response
JustJake
JustJake5mo ago
Well yes Is your app returning the correct response?
pauldps
pauldpsOP5mo ago
no, it's returning that huge HTML from Railway
JustJake
JustJake5mo ago
No I mean like
pauldps
pauldpsOP5mo ago
that's just the first request though
JustJake
JustJake5mo ago
What happens is this intermittent is it always
pauldps
pauldpsOP5mo ago
the next requests work fine it's always when the app is sleeping "first request when app is sleeping"
JustJake
JustJake5mo ago
"First request when app sleeping results in 503" Gotchu Esclating again This is only on the V2 runtime ye?
pauldps
pauldpsOP5mo ago
correct does not happen on Legacy I suspect it wasn't more widely noticed because it was returning a 200
JustJake
JustJake5mo ago
Yep I suspect so too. Bubbled up
Mig
Mig5mo ago
@Brody , is this application one of those app sleep healthcheck candidates ? The fact the legacy proxy would work but new one does not makes me think the healthcheck feature we talked about wouldn't matter. The new proxy is using the same timeouts as legacy and these edge proxies are not aware of application sleep logic. There must be a timeout setting that is different.
Brody
Brody5mo ago
yep I feel like this could benefit from having the heath check checked on waking up the service
Mig
Mig5mo ago
oh this is saying that this issue with app sleep waking only happens on v2 runtime but works on legacy runtime. Correct ? We just happen to be on the new proxy and saw those 200s (sorry about that...will work on ensuring issues that like that don't through) I was able to reproduce and have a strong lead on the issue. I'm continuing to work on this and will share updates here when I have some. Hello, I'm still looking into this. heads up, I saw a few reports of 502s and the people's containers were stopped if you don't exit with non-0 the restart policy won't start it again. unsure what stopped the people's applications but pointing out that checking the logs for container exits and restart the container resolved the issue.
Brody
Brody5mo ago
noted, thanks for the heads up
Mig
Mig5mo ago
I’m not sure if I’ll be digging in to the more since I found in 2 examples of this the container was stopped. Next time someone brings this up I’ll just ask to restart their container to see if it comes back I’ll have to look in to it if more people report of course
Brody
Brody5mo ago
OP did say it was reproducible between multiple deployments
Mig
Mig5mo ago
ok I don't think it's stopped containers in that case. !remind me to reproduce this in 1 hour
Duchess
Duchess5mo ago
Got it, I will remind you to reproduce this at Fri, 12 Jul 2024 19:39:21 GMT Hey @Mig, remember to reproduce this - https://discord.com/channels/713503345364697088/1258304003930984531/1261391468778487909
JustJake
JustJake5mo ago
BTW I don't think we have the cycles ATM to repro stuff for people They'll have to come to us with reproductions that we can have a look at
Mig
Mig5mo ago
In production I have an app with app sleep and v2 runtime. On my first request it says the app is unavailable. The container starts though. Second request it works. So easy to reproduce.
JustJake
JustJake5mo ago
Okay then givr
pauldps
pauldpsOP5mo ago
I don't understand this part: "the container was stopped". Do you mean the container was stopped manually? In my case the container stopped because it went to sleep. I didn't do anything to stop the container and my app is a long-running http server, so it doesn't stop on its own. The only reason it stops is the App Sleeping feature.
Brody
Brody5mo ago
I can assure you applications can stop on their own
JustJake
JustJake5mo ago
I think like, even if it does stop on it's own, we should probably restart it? Cause like if it crashes and another request or something comes in IDK
Brody
Brody5mo ago
if it exits with an error code, yes, if it exits with a success code, maybe not? but yes if restart is set to always
Mig
Mig5mo ago
Brody got it for the restart behaviour The fact I could reproduce this though means we can disregard what I said about the container being stopped. I think mentioning it might have been a mistake as I was jumping between a few threads debugging stuff. I will dig into this more next week.
Duchess
Duchess5mo ago
New reply sent from Help Station thread:
I have same problem...With legacy runtime work well but with new V2 not.The first http(s) request via browser is allways 502, nexts working normally.Build with custom Dockerfile based on alpine+nginx+phpfpm
You're seeing this because this thread has been automatically linked to the Help Station thread.
Mig
Mig5mo ago
hey folks, I spent some time on this and basically, the v2 runtime wakes and forwards http requests differently than v1 runtime. I have observed the success rate of starting and getting an HTTP response to be pretty flakey (sometimes it works, sometimes it does not). I believe this is something to do with how fast the container can start in v2 runtime before the request times out. I can't spend more time on this right now because the number of reports for this has been small and have to prioritize some other issues. If you need app sleep right now I advise just using the v1 (legacy) runtime.
Duchess
Duchess5mo ago
New reply sent from Help Station thread:
Thanks for the update, I just want to say that I also faced the same issue with my python instance. Mainly observing that the first request wakes up the instance (but the request does not go through) but any subsequent request s work. Will try v1 for the time being but would be nice if resolved.
You're seeing this because this thread has been automatically linked to the Help Station thread.
JustJake
JustJake4mo ago
We’re expressly not going to be able to prioritize this until the new proxy is out unfortunately
pauldps
pauldpsOP2mo ago
Has something changed with this issue? My Legacy apps are starting to show 502 errors after coming back from sleep Most of my apps also no longer allow me to change between Legacy and V2
Brody
Brody2mo ago
We have indeed removed the ability for users to switch back to legacy
pauldps
pauldpsOP2mo ago
Is Legacy going to be removed soon?
Brody
Brody2mo ago
when we move to metal legacy will not be supported, thus in the intrest of moving to metal faster all deploys for all plan tiers use runtime v2
pauldps
pauldpsOP2mo ago
I really need App Sleep to work reliably 😦
Brody
Brody2mo ago
well then you will be pleased to know that the new proxy is indeed fully rolled out and 100% of the nearly half a million domains used on our platform now have their traffic served via the new proxy, thus we should be able to take a look at picking back up the 502 app sleeping issue.
pauldps
pauldpsOP2mo ago
That would be great, V2 working with App Sleep would be ideal
Brody
Brody2mo ago
i've also bumped the linear ticket on your behalf
pauldps
pauldpsOP2mo ago
Appreciate that, thanks!
JustJake
JustJake4w ago
We will make sure this works reliably within the next 2 weeks !remind me to circle back in 2 weeks
Duchess
Duchess4w ago
Got it, I will remind you to circle back at Mon, 11 Nov 2024 16:28:32 GMT
Brody
Brody3w ago
Just wanna say that we are actively working on a solution to this!
Brody
Brody2w ago
Circling has been done
Mig
Mig6d ago
We have a solution and hoping to have it out this week. Solution is being tested and trying to get it out tomorrow. I'll be sure to comment here when it is. A fix has been merged and in production now. @pauldps give it a try whenever you can and let me know. Current implementation allows your app to take up to 10 seconds to accept the incoming connection.
Brody
Brody6d ago
They do at least have to redeploy for the new chances to take effect right?
Mig
Mig6d ago
oh yes. Please trigger a redeploy. This action applies some settings for the network to be aware of your application has application sleep set. since this issue impacted applications that started slower than 100 ms, making something that backfilled the applications did not seem worth it given a redeploy would fix. My go application for example never has this issue because it starts up fast enough to accept the connection before the host rejects it thinking there's no app listening.
started slower than 100 ms
this number is a guess. I think it's roughly correct. Might be 30-100ms
pauldps
pauldpsOP5d ago
So this is what I did: - Changed the app to V2 - Triggered a redeploy (also made some code changes etc, it's a GraphQL Yoga API running in Bun) - Service starts fine (health check worked on first try) and runs fine in a browser - Service goes to sleep - I refresh the website - Got the error in the first image Did I miss anything?
No description
No description
Brody
Brody5d ago
request id please
pauldps
pauldpsOP5d ago
n6zQAxIuT5ysSFbJ-GY0nA_3118653284
Brody
Brody5d ago
ill let mig comment on this though, might be worth trying a newer version of bun, you're on 1.1.18
pauldps
pauldpsOP5d ago
I'll do that soon, but I don't think it will fix the issue the app is booting in about ~1s
Brody
Brody5d ago
i was able to confirm app sleeping works with a node app that took 8 seconds to start, so this may just be bun being bun
pauldps
pauldpsOP5d ago
maybe my project is stuck in some old/cached workflow?
Brody
Brody5d ago
what region?
pauldps
pauldpsOP5d ago
us-west, the default one
Brody
Brody5d ago
same, we'll see if mig wants to work around bun's strange networking issues on monday
pauldps
pauldpsOP5d ago
I can test with a Ruby/Sinatra app later I tested it with another Bun app but in a different project and it worked I think my project/service is borked somehow the Sinatra app also worked fine on first try just re-tested the project that had the issue and it still showed the error This project works: 46548220-e0ba-4a16-b80a-706a55133413 This one does not: 34304961-2ebf-4d0b-b2ae-3585cf6b9353 (service: e2a687a5-9ce2-4694-81ae-12c6756b0bce)
ThallesComH
ThallesComH5d ago
maybe try with the same code but in a new project?
pauldps
pauldpsOP5d ago
for the project that's not working it'll be a bit more difficult since it has other dependencies inside that project that I'd have to deploy too, but I will do it if time permits
ThallesComH
ThallesComH5d ago
you could create a template from the project and then create a new project from it its in project settings
pauldps
pauldpsOP5d ago
oh, didn't know that. I'll give it a try will the new project use the same env variables and stuff?
Mig
Mig5d ago
If someone gives me the source code for reproducible bug I will check it out!
pauldps
pauldpsOP5d ago
I copied my services to another project and it seems to be working without issues, no 502s on wakeup so it seems my old project is somehow bugged I'll make one last test, as my old service had a volume attached that I wasn't using. I've deleted the volume, redeployed the service, and will wait for it to sleep just did ☝️ and it errored the same, so it doesn't seem to be volume-related
Brody
Brody5d ago
@pauldps - as mig said, we would need a reproducible example in order to look into it
pauldps
pauldpsOP5d ago
I'm not sure that's reproducible both projects are running the same code with the same env vars, one works, one does not I have given the project IDs of both, feel free to look into them
Brody
Brody5d ago
unfortunately we wont be able to spend time doing that, we would need a reproducible example
pauldps
pauldpsOP5d ago
you're probably going to get other people with old projects facing the issue but some of them won't be able to do what I did (copy/move everything to a new project)
Brody
Brody5d ago
i had a service i deployed 8 months ago, with the changes done it can now properly wake up from sleep
pauldps
pauldpsOP5d ago
same-ish with my Sinatra project that's why I think it's a problem with my project specifically it's not something related to source code something about my project, infrastructure/configuration-wise, not code-wise, might be causing the issue but I can't look at infra/configs
Brody
Brody5d ago
if you think that is the case you are welcome to try duplicating the gesund-api service
pauldps
pauldpsOP5d ago
I already did that ... ah, the service, into the same project?
Brody
Brody5d ago
yes
pauldps
pauldpsOP5d ago
I duplicated the entire project let me give that a try done, will wait for it to sleep also have an update: the second project's first request failed on wake up, just tried now that said, how do I actually send the code of this repo for reproduction?
pauldps
pauldpsOP5d ago
Request ID: f5B_p5h0T5SopI4KU
No description
pauldps
pauldpsOP5d ago
the exact same code as gesund-api. Bun as well
Brody
Brody5d ago
just since its easy, I'd still recommend trying the latest bun version
pauldps
pauldpsOP5d ago
Bun upgraded to 1.1.34
Brody
Brody5d ago
and if this doesn't work, we would need that MRE
pauldps
pauldpsOP5d ago
it's going to be very hard for me to have a MRE if my other Bun example (let's call that Project 3) isn't failing I'll upgrade Bun there too and test it again
Brody
Brody5d ago
project id?
pauldps
pauldpsOP5d ago
46548220-e0ba-4a16-b80a-706a55133413 this one is just a plain Bun server it doesn't seem to be affected by the issue I have other branches where I have other servers like Sinatra for testing purposes
Brody
Brody5d ago
also bun 1.1.18
pauldps
pauldpsOP5d ago
as for that MRE I mean how i can make it actually minimal, it's a graphql api with multiple endpoints, I don't know what is making it fail if anything. this testing takes time, I have to wait the app to sleep and test the first request, it doesn't seem like I can make several code changes removing stuff until I find the culprit if there's one
Brody
Brody5d ago
its up to you to remove as much code from the current app while still retaining the issue
pauldps
pauldpsOP5d ago
that might take me a very long time
Brody
Brody5d ago
there is no rush
pauldps
pauldpsOP5d ago
I'll see what I can do but I'm not happy about having to do this when the server is returning a 502 which seems to be out of my control
Brody
Brody5d ago
what your app is doing is out of our control too, and thus we need that MRE to reproduce and patch around it
pauldps
pauldpsOP5d ago
don't you think it it was an error on my app we'd at least see it in the logs? there are no logs for the failed request
Brody
Brody5d ago
no i dont think we would see logs for this
pauldps
pauldpsOP5d ago
does Railway use the health check path during wakeup?
Brody
Brody5d ago
no
pauldps
pauldpsOP5d ago
how does it know it is up and ready to serve requests?
Brody
Brody5d ago
we replay the incoming connection for up to 10 seconds
pauldps
pauldpsOP5d ago
I have a MRE. Instead of changing my project, I deployed a minimal Graphql-yoga+Bun server to Project 3. Just tried the first request on sleep and it failed with a 502. Here's the code: https://github.com/pauldps/bun-railway-v2-test/tree/graphql-yoga
Brody
Brody5d ago
so the newer bun version didnt help it seems
pauldps
pauldpsOP5d ago
yup, the previous version (no graphql) running Bun 1.1.18 also didn't have the issue
Brody
Brody5d ago
alright, thank you
pauldps
pauldpsOP5d ago
Strangely: I deployed the same app on Project 2 (a0aefb5f-15c4-49c6-a7ec-020b58d0cfc5) same branch and everything. It's working fine! I've checked it twice now and both requests worked fine. So I don't know what's going on
Brody
Brody5d ago
well i have your code deployed so ill let you know if i can reproduce it
pauldps
pauldpsOP5d ago
I got an error on the app running on Project 2. The fact that it occasionally works seems to be a bit random.
Brody
Brody5d ago
I got your bun MRE to cause a 502 on wake so yeah, some strange issue with bun
pauldps
pauldpsOP5d ago
is Bun networking known to cause issues?
Brody
Brody5d ago
yeah, it's not the first time
Mig
Mig4d ago
since we got a MRE I could try it out tomorrow. Thanks for the persistence on wanting to get this fixed. Sometimes it's a 50/50 effort for us to help when the issue is rare.
Brody
Brody4d ago
here is their MRE repo in template form - https://railway.com/template/lGBlqd
pauldps
pauldpsOP4d ago
is it deploying the graphql-yoga branch? (can't tell)
Brody
Brody4d ago
yes
Want results from more Discord servers?
Add your server