Node graceful shutdowns

I have a node task manager process deployed to Railway that I'm trying to get to gracefully shutdown (finish current tasks before killing the service). Is this possible on railway? I've added logic to catch the SIGTERM and begin shutdown and this works in my local environment when I give it a SIGTERM, but in Railway it exits immediately with the following:
npm ERR! path /app
npm ERR! command failed
npm ERR! signal SIGTERM
npm ERR! command sh -c -- SKIP_ENV_VALIDATION=true node --es-module-specifier-resolution=node .wss-dist/src/server/scheduler/scheduler-entrypoint.js | npm run log-agent -- --service=scheduler | pino-pretty
npm ERR! A complete log of this run can be found in:
npm ERR! /root/.npm/_logs/2023-06-05T15_35_34_524Z-debug-0.log
npm ERR! path /app
npm ERR! command failed
npm ERR! signal SIGTERM
npm ERR! command sh -c -- SKIP_ENV_VALIDATION=true node --es-module-specifier-resolution=node .wss-dist/src/server/scheduler/scheduler-entrypoint.js | npm run log-agent -- --service=scheduler | pino-pretty
npm ERR! A complete log of this run can be found in:
npm ERR! /root/.npm/_logs/2023-06-05T15_35_34_524Z-debug-0.log
It doesn't even seem to hit my SIGTERM handler, defined below. Can't find any docs about this kind of thing, is this kind of delayed shutdown supported?
const gracefulShutdown = (code: string) => {
logger.info(`Worker received ${code}. Shutting down...`)
multiWorker.end().then(() => {
process.exit(0)
})
}

const signals: NodeJS.Signals[] = ["SIGTERM", "SIGINT", "SIGUSR2"]

signals.forEach((signal) => {
process.on(signal, () => gracefulShutdown(signal))
})
const gracefulShutdown = (code: string) => {
logger.info(`Worker received ${code}. Shutting down...`)
multiWorker.end().then(() => {
process.exit(0)
})
}

const signals: NodeJS.Signals[] = ["SIGTERM", "SIGINT", "SIGUSR2"]

signals.forEach((signal) => {
process.on(signal, () => gracefulShutdown(signal))
})
Solution:
you can catch the kill signal, but you only get ~3 seconds of grace time before the container is force killed, indicated by the 3 dots and a missing exit message that never got printed
Jump to solution
33 Replies
Percy
Percy2y ago
Project ID: 95c304a2-fe2b-4ad1-bc3f-5296fd26f36c
Ian Woodfill
Ian WoodfillOP2y ago
95c304a2-fe2b-4ad1-bc3f-5296fd26f36c
Brody
Brody2y ago
send your package.json please
Ian Woodfill
Ian WoodfillOP2y ago
{
"name": "nocode",
"version": "0.1.0",
"private": true,
"type": "module",
"engines": {
"node": "^18.0.0"
},
"scripts": {
"create-script": "node scripts/create-script.cjs",
"prebuild": "prisma generate && prisma migrate deploy",
"build:1-next": "next build",
"build:2-server": "tsc --project tsconfig.wss.json && tsc-alias -p tsconfig.wss.json",
"build": "run-s build:*",
"check-types": "npx tsc --noEmit --project tsconfig.json",
"dev:wss": "SKIP_ENV_VALIDATION=true npx tsx watch --tsconfig tsconfig.wss.json src/server/wss-server/dev.ts --clear-screen=false | npm run log-agent -- --service=wss | pino-pretty --colorize",
"dev:scheduler": "SKIP_ENV_VALIDATION=true npx tsx watch --tsconfig tsconfig.wss.json src/server/scheduler/scheduler-entrypoint.ts --clear-screen=false | npm run log-agent -- --service=scheduler | pino-pretty --colorize",
"dev:next": "next dev | npm run log-agent -- --service=next | pino-pretty --colorize",
"dev": "prisma generate && run-p dev:* -l",
"lint": "next lint",
"lint-staged": "npx lint-staged",
"postinstall": "prisma generate",
"start": "SKIP_ENV_VALIDATION=true node --es-module-specifier-resolution=node .wss-dist/src/server/wss-server/prod.js | npm run log-agent -- --service=server | pino-pretty",
"start:scheduler": "SKIP_ENV_VALIDATION=true node --es-module-specifier-resolution=node .wss-dist/src/server/scheduler/scheduler-entrypoint.js | npm run log-agent -- --service=scheduler | pino-pretty",
"log-agent": "tsx --tsconfig tsconfig.log-agent.json src/log-agent.ts"
},
{
"name": "nocode",
"version": "0.1.0",
"private": true,
"type": "module",
"engines": {
"node": "^18.0.0"
},
"scripts": {
"create-script": "node scripts/create-script.cjs",
"prebuild": "prisma generate && prisma migrate deploy",
"build:1-next": "next build",
"build:2-server": "tsc --project tsconfig.wss.json && tsc-alias -p tsconfig.wss.json",
"build": "run-s build:*",
"check-types": "npx tsc --noEmit --project tsconfig.json",
"dev:wss": "SKIP_ENV_VALIDATION=true npx tsx watch --tsconfig tsconfig.wss.json src/server/wss-server/dev.ts --clear-screen=false | npm run log-agent -- --service=wss | pino-pretty --colorize",
"dev:scheduler": "SKIP_ENV_VALIDATION=true npx tsx watch --tsconfig tsconfig.wss.json src/server/scheduler/scheduler-entrypoint.ts --clear-screen=false | npm run log-agent -- --service=scheduler | pino-pretty --colorize",
"dev:next": "next dev | npm run log-agent -- --service=next | pino-pretty --colorize",
"dev": "prisma generate && run-p dev:* -l",
"lint": "next lint",
"lint-staged": "npx lint-staged",
"postinstall": "prisma generate",
"start": "SKIP_ENV_VALIDATION=true node --es-module-specifier-resolution=node .wss-dist/src/server/wss-server/prod.js | npm run log-agent -- --service=server | pino-pretty",
"start:scheduler": "SKIP_ENV_VALIDATION=true node --es-module-specifier-resolution=node .wss-dist/src/server/scheduler/scheduler-entrypoint.js | npm run log-agent -- --service=scheduler | pino-pretty",
"log-agent": "tsx --tsconfig tsconfig.log-agent.json src/log-agent.ts"
},
start:scheduler is the process that I've tested this with "start:scheduler": "SKIP_ENV_VALIDATION=true node --es-module-specifier-resolution=node .wss-dist/src/server/scheduler/scheduler-entrypoint.js | npm run log-agent -- --service=scheduler | pino-pretty",
Ian Woodfill
Ian WoodfillOP2y ago
hahaha, that bad?
Brody
Brody2y ago
thats a lot of text but im struggling to understand why youd want to outright stop your app, gracefully or not
Ian Woodfill
Ian WoodfillOP2y ago
Ah yeah I mean on deployment I want previous deployments to shutdown gracefully
Brody
Brody2y ago
but why, theres a new deployment to handle requets, why do you care about the old deployment at that point
Ian Woodfill
Ian WoodfillOP2y ago
The service runs tasks that often take > 5 minutes, and aren't resumable tasks. The current configuration stops them mid-run, which breaks certain guarantees in our app. So the ideal would be to catch the SIGTERM, stop handling new tasks on that specific instance, and finish any ongoing tasks, then call process.exit() Just curious how you guys handle exits on your end, and if this kind of workflow is even possible. Does your removal logic allow for my deployments to define when they exit?
Brody
Brody2y ago
(i dont work for railway) im guessing whatever runs to kills the docker containers force kills it once the new deployment is live, not respecting your gracefull shutdown handler. i will do some testing and get back to you on this
Ian Woodfill
Ian WoodfillOP2y ago
Oh wow, didnt realize that. Respect! Let me know what you learn 🙂
Brody
Brody2y ago
i have returned with information
Brody
Brody2y ago
Solution
Brody
Brody2y ago
you can catch the kill signal, but you only get ~3 seconds of grace time before the container is force killed, indicated by the 3 dots and a missing exit message that never got printed
Ian Woodfill
Ian WoodfillOP2y ago
oh interesting... 3 seconds is a unfortunate amount for my situation, but good to know. what is the logic you used to catch the signal? i'm still not able to even catch it.
Brody
Brody2y ago
i used golang
Ian Woodfill
Ian WoodfillOP2y ago
console.log('begin')

process.on('SIGTERM', () => {
console.log('Shutting down')
process.exit(0)
})


const nextTick = () => {
console.log('tick')
setTimeout(nextTick, 5000)
}

nextTick()
console.log('begin')

process.on('SIGTERM', () => {
console.log('Shutting down')
process.exit(0)
})


const nextTick = () => {
console.log('tick')
setTimeout(nextTick, 5000)
}

nextTick()
tick
tick
tick
tick
tick
npm ERR! path /app
npm ERR! command failed
npm ERR! signal SIGTERM
npm ERR! command sh -c -- node index.js
npm ERR! A complete log of this run can be found in:
npm ERR! /root/.npm/_logs/2023-06-05T16_52_01_148Z-debug-0.log
tick
tick
tick
tick
tick
tick
npm ERR! path /app
npm ERR! command failed
npm ERR! signal SIGTERM
npm ERR! command sh -c -- node index.js
npm ERR! A complete log of this run can be found in:
npm ERR! /root/.npm/_logs/2023-06-05T16_52_01_148Z-debug-0.log
tick
Brody
Brody2y ago
package main

import (
"fmt"
"os"
"os/signal"
"syscall"
"time"
)

func main() {
fmt.Println("Hello World!")

fmt.Println("Waiting for kill signal...")

var sig = make(chan os.Signal, 1)
signal.Notify(sig, os.Kill, os.Interrupt, syscall.SIGTERM, syscall.SIGINT)

var receivedSignal = <-sig
fmt.Fprintln(os.Stderr, "Received signal:", receivedSignal)

fmt.Println("App killed, cleaning up..")

var sleepTime = 30 * time.Second

fmt.Println("Artificially pausing for", sleepTime)

go func() {
for range time.Tick(1 * time.Second) {
fmt.Print(". ")
}
}()

time.Sleep(sleepTime)
fmt.Println()

fmt.Println("Times up, exiting with status 0")
os.Exit(0)
}
package main

import (
"fmt"
"os"
"os/signal"
"syscall"
"time"
)

func main() {
fmt.Println("Hello World!")

fmt.Println("Waiting for kill signal...")

var sig = make(chan os.Signal, 1)
signal.Notify(sig, os.Kill, os.Interrupt, syscall.SIGTERM, syscall.SIGINT)

var receivedSignal = <-sig
fmt.Fprintln(os.Stderr, "Received signal:", receivedSignal)

fmt.Println("App killed, cleaning up..")

var sleepTime = 30 * time.Second

fmt.Println("Artificially pausing for", sleepTime)

go func() {
for range time.Tick(1 * time.Second) {
fmt.Print(". ")
}
}()

time.Sleep(sleepTime)
fmt.Println()

fmt.Println("Times up, exiting with status 0")
os.Exit(0)
}
Ian Woodfill
Ian WoodfillOP2y ago
and you're catching the SIGTERM?
Brody
Brody2y ago
yes obviously this works on local, but on railway the containers get force killed. my recommendation would be to have a separate worker service, that way when you deploy a change to your main app the worker service is uneffected. i know thats not a perfect solution since eventually you would have to deploy some changes to the worker service in the future, you might want to then also employ a 3rd party work queuing framework and with all that said, railway is great but it cant cover every single users specfic usecase perfectly, so there may be other PAAS platforms that will wait for your app to exit on it own, or maybe your workload would even work better on a VPS
Ian Woodfill
Ian WoodfillOP2y ago
Yeah those are definitely good suggestions. We're definitely starting to hit the edges of what Railway is capable of doing at this point. Definitely not looking forward to the 10x increase in complexity from most other PaaS just for a couple small additional capabilities, though :/
Ian Woodfill
Ian WoodfillOP2y ago
Appreciate the help, Brody
Brody
Brody2y ago
if you have any more questions id be happy to answer them (within reason) 🙂
Ian Woodfill
Ian WoodfillOP2y ago
Thanks! Have you found any good PaaS services that fit the niche of being incrementally more configurable than railway without going all in on like a barebones cloud? What do most people "graduate" to once they start hitting the limits of the platform?
Brody
Brody2y ago
hey now i cant just go spouting out competitors lol, nice try though 🤣
Ian Woodfill
Ian WoodfillOP2y ago
fair enough lol. Would love to be able to configure that force kill timeout!
Brody
Brody2y ago
I suspect this is done to combat the creation of a phantom container, so I don't know how much luck you will have with this on other platforms, because no providers wants containers running unchecked but never say never, and screw it, give fly.io a go
Ian Woodfill
Ian WoodfillOP2y ago
funny you should say that, just spent the last couple hours toying around with Fly.io. While it seems good, I honestly may just defer for the time being to batching and scheduling deploys during off-peak hours for now to avoid the headache of moving over for this specific issue.
Brody
Brody2y ago
sounds good
Ian Woodfill
Ian WoodfillOP2y ago
Out of curiosity, why are you so active on the forum? A railway super fan?
Brody
Brody2y ago
it's true, I do like railway a lot, but I also like helping people, everyone comes onto this platform with a different level of knowledge so I try to help out where I can
Ian Woodfill
Ian WoodfillOP2y ago
thats dope. appreciate you taking the time
Brody
Brody2y ago
thank you 🙂
Want results from more Discord servers?
Add your server