R
Railway•6mo ago
rob

Unresponsive deployment after some hours

Hello, I've been using Railway to host my Telegram bot for more than an year and I never experienced this sort of issue until I enabled app sleeping a few days ago. With that option enabled, the bot would just go to sleep without ever being able to wake (but that's fine, given I didn't take into account that a Telegram bot only pulls for updates). So, when I noticed that I disabled app sleeping and allowed the bot to redeploy with the new configuration but in the next days the bot became unresponsive after a few hours. I tried to fix it redeploying it but it was always a temporary fix and now it seems to be stuck again (with no issues logged). Can you please check if there is anything wrong with the configuration or the deployment of my project? I'd be happy to share any other information needed. Also, I'd like to have the project back to work as soon as possible but I can keep it broken until the issue is triaged, if needed. The project is f6d25e17-9bb2-457f-8ce2-e55b5ce1dcd8 The service is 8c4e4e87-7cf3-4ab2-9ec2-b6cc41db7b5b The deployment is 81cc519b-a7a3-41db-8266-53e08952e935 EDIT: the deploy was triggered yesterday at 2:16 PM CET (GMT+2) and it was working properly at least until today at 1:15 AM CET (GMT+2)
28 Replies
Percy
Percy•6mo ago
Project ID: f6d25e17-9bb2-457f-8ce2-e55b5ce1dcd8,8c4e4e87-7cf3-4ab2-9ec2-b6cc41db7b5b,81cc519b-a7a3-41db-8266-53e08952e935
Brody
Brody•6mo ago
So it sounds like it's still going to sleep? does that sound right to you? the immediate solution would be to deploy your bot into another service and leave the bugged service alone for now side note, I'd be curious to see how a telegram bot that uses webhooks instead of polling would work with app sleeping
rob
robOP•6mo ago
So it sounds like it's still going to sleep?
It feels like it but I'm having a hard time figuring it out what could be causing it (also given there were little to no changes recently to bot's logic)
the immediate solution would be to deploy your bot into another service and leave the bugged service alone for now
Thanks, I didn't thought about that! I've now deployed the service on my test environment while leaving the production alone. As I suspected I've got no errors from Telegram (which should complain when 2 instances of the same bot are running concurrently), it really feels dead. Thanks as always @Brody, should I ping again in a few days to see if the issue can be looked thoroughly?
Brody
Brody•6mo ago
if you dont hear back from me by Tuesday please ping, as i plan on bringing this up to the team, and in that case it would be helpful to leave the suspected bugged service untouched if possible hey rob, the applicable person would be off until tmr
rob
robOP•6mo ago
@Brody I've got a bad news (totally on me): I left automatic deploys enabled and a pull request has been automerged in the night so the faulty deployment is now gone. For the time being you can ignore this issue, I will ping you if it happens again
Brody
Brody•6mo ago
okay sounds good!
rob
robOP•5mo ago
Hi @Brody, sorry to bother you once more but it finally happened again. The deployment stuck is e90da47 (service 8c4e4e87-7cf3-4ab2-9ec2-b6cc41db7b5b). I can't unfortunately disable the connection with my main branch because that would require a redeploy but I will do my best to stop merging code until the issue is looked at (the only issue would be an automerge by renovate during nighttime)
Brody
Brody•5mo ago
so the bot is unresponsive?
rob
robOP•5mo ago
It wasn't until I deployed it on my test environment (with prod's token). It has been stuck for about 3 and half hours now
Brody
Brody•5mo ago
what makes you think your deployment is being put to sleep instead of soft locking or something similar?
rob
robOP•5mo ago
Speaking for this deployment only as I can't really recall the oldest ones, it was a freshly deployed instance (5 hours old), it wasn't consuming that many resources and lately I limited the concurrency to max 10 requests at a time The bot itself never suffered issues causing it to stop working without any sign, I would expect at least a stacktrace but I had none You can see in the image the point in time when it stopped working, while processing 10 JPEG -> PNG conversions
No description
Brody
Brody•5mo ago
you suspect it got slept at 3:12 pm?
rob
robOP•5mo ago
Somewhere after that, I can only say that's the latest log I had proving the bot was online
Brody
Brody•5mo ago
and is it currently "sleeping" or have you since done a redeploy
rob
robOP•5mo ago
It is still sleeping to this moment, I left prod deployment there and enabled the test one with the same token and it doesn't result in Telegram's error saying that only one instance of a bot can run concurrently Also, I never saw its memory decrease in such a "slow" curve to this time as it did after those 3:12 PM (CET).
Brody
Brody•5mo ago
so heres the thing, if it was sleeping the memory reporting would have frozen yet memory reporting continued and did differ now i know i said i planned on bringing this up to the team, but im going to hold off on that for now since this is not looking like a platform issue
rob
robOP•5mo ago
That would be fine and I get your point of view, I need to find a way to figure it out because as of now I wouldn't know how to triage it
Brody
Brody•5mo ago
is that service on the v2 runtime at least?
rob
robOP•5mo ago
New builder and runtime v2
Brody
Brody•5mo ago
then i think the next course of action would be to add some very verbose logging so you can try to determine when and where your app is softlocking, and then hopefully why
rob
robOP•5mo ago
I will see what I can come up with, thanks as always 😄 May I delete the stuck instance? I have no rush on that regard
Brody
Brody•5mo ago
yep! and if you still think this is railway sleeping your service, catch and log sigterm as thats the signal sent when your container is stopped by railway for any reason
rob
robOP•5mo ago
How confident do you feel saying that the memory of a sleeping service will be frozen until awoken?
Brody
Brody•5mo ago
unless they have since fixed that (doubt it) then that would still be the case
Brody
Brody•5mo ago
yep, metrics are not seeded with zeros during sleep so they would appear as frozen if there was metrics to begin with, but this service is slept for a very long time so there is not enough awake metrics to show either
No description
rob
robOP•5mo ago
How long would it take for those services to go to sleep? I don't think that's my case because I guess I would notice it in the UI
Brody
Brody•5mo ago
10 - 15 minutes maybe but yeah i think its very safe to say that your deployment was not slept
rob
robOP•5mo ago
Perfect, it doesn't match my deployment alive period
Want results from more Discord servers?
Add your server