Topics

Railway•2y ago

Node graceful shutdowns

I have a node task manager process deployed to Railway that I'm trying to get to gracefully shutdown (finish current tasks before killing the service). Is this possible on railway? I've added logic to catch the SIGTERM and begin shutdown and this works in my local environment when I give it a SIGTERM, but in Railway it exits immediately with the following:

npm ERR! path /app
npm ERR! command failed
npm ERR! signal SIGTERM
npm ERR! command sh -c -- SKIP_ENV_VALIDATION=true node --es-module-specifier-resolution=node .wss-dist/src/server/scheduler/scheduler-entrypoint.js | npm run log-agent -- --service=scheduler | pino-pretty
npm ERR! A complete log of this run can be found in:
npm ERR!     /root/.npm/_logs/2023-06-05T15_35_34_524Z-debug-0.log

npm ERR! path /app
npm ERR! command failed
npm ERR! signal SIGTERM
npm ERR! command sh -c -- SKIP_ENV_VALIDATION=true node --es-module-specifier-resolution=node .wss-dist/src/server/scheduler/scheduler-entrypoint.js | npm run log-agent -- --service=scheduler | pino-pretty
npm ERR! A complete log of this run can be found in:
npm ERR!     /root/.npm/_logs/2023-06-05T15_35_34_524Z-debug-0.log

It doesn't even seem to hit my SIGTERM handler, defined below. Can't find any docs about this kind of thing, is this kind of delayed shutdown supported?

  const gracefulShutdown = (code: string) => {
    logger.info(`Worker received ${code}. Shutting down...`)
    multiWorker.end().then(() => {
      process.exit(0)
    })
  }

  const signals: NodeJS.Signals[] = ["SIGTERM", "SIGINT", "SIGUSR2"]

  signals.forEach((signal) => {
    process.on(signal, () => gracefulShutdown(signal))
  })

  const gracefulShutdown = (code: string) => {
    logger.info(`Worker received ${code}. Shutting down...`)
    multiWorker.end().then(() => {
      process.exit(0)
    })
  }

  const signals: NodeJS.Signals[] = ["SIGTERM", "SIGINT", "SIGUSR2"]

  signals.forEach((signal) => {
    process.on(signal, () => gracefulShutdown(signal))
  })

Solution:

you can catch the kill signal, but you only get ~3 seconds of grace time before the container is force killed, indicated by the 3 dots and a missing exit message that never got printed

Jump to solution

33 Replies

Percy•2y ago

Project ID: 95c304a2-fe2b-4ad1-bc3f-5296fd26f36c

Ian WoodfillOP•2y ago

95c304a2-fe2b-4ad1-bc3f-5296fd26f36c

Brody•2y ago

send your package.json please

Ian WoodfillOP•2y ago

{
  "name": "nocode",
  "version": "0.1.0",
  "private": true,
  "type": "module",
  "engines": {
    "node": "^18.0.0"
  },
  "scripts": {
    "create-script": "node scripts/create-script.cjs",
    "prebuild": "prisma generate && prisma migrate deploy",
    "build:1-next": "next build",
    "build:2-server": "tsc --project tsconfig.wss.json && tsc-alias -p tsconfig.wss.json",
    "build": "run-s build:*",
    "check-types": "npx tsc --noEmit --project tsconfig.json",
    "dev:wss": "SKIP_ENV_VALIDATION=true npx tsx watch --tsconfig tsconfig.wss.json src/server/wss-server/dev.ts --clear-screen=false | npm run log-agent -- --service=wss | pino-pretty --colorize",
    "dev:scheduler": "SKIP_ENV_VALIDATION=true npx tsx watch --tsconfig tsconfig.wss.json src/server/scheduler/scheduler-entrypoint.ts --clear-screen=false | npm run log-agent -- --service=scheduler | pino-pretty --colorize",
    "dev:next": "next dev | npm run log-agent -- --service=next | pino-pretty --colorize",
    "dev": "prisma generate && run-p dev:* -l",
    "lint": "next lint",
    "lint-staged": "npx lint-staged",
    "postinstall": "prisma generate",
    "start": "SKIP_ENV_VALIDATION=true node --es-module-specifier-resolution=node .wss-dist/src/server/wss-server/prod.js | npm run log-agent -- --service=server | pino-pretty",
    "start:scheduler": "SKIP_ENV_VALIDATION=true node --es-module-specifier-resolution=node .wss-dist/src/server/scheduler/scheduler-entrypoint.js | npm run log-agent -- --service=scheduler | pino-pretty",
    "log-agent": "tsx --tsconfig tsconfig.log-agent.json src/log-agent.ts"
  },

{
  "name": "nocode",
  "version": "0.1.0",
  "private": true,
  "type": "module",
  "engines": {
    "node": "^18.0.0"
  },
  "scripts": {
    "create-script": "node scripts/create-script.cjs",
    "prebuild": "prisma generate && prisma migrate deploy",
    "build:1-next": "next build",
    "build:2-server": "tsc --project tsconfig.wss.json && tsc-alias -p tsconfig.wss.json",
    "build": "run-s build:*",
    "check-types": "npx tsc --noEmit --project tsconfig.json",
    "dev:wss": "SKIP_ENV_VALIDATION=true npx tsx watch --tsconfig tsconfig.wss.json src/server/wss-server/dev.ts --clear-screen=false | npm run log-agent -- --service=wss | pino-pretty --colorize",
    "dev:scheduler": "SKIP_ENV_VALIDATION=true npx tsx watch --tsconfig tsconfig.wss.json src/server/scheduler/scheduler-entrypoint.ts --clear-screen=false | npm run log-agent -- --service=scheduler | pino-pretty --colorize",
    "dev:next": "next dev | npm run log-agent -- --service=next | pino-pretty --colorize",
    "dev": "prisma generate && run-p dev:* -l",
    "lint": "next lint",
    "lint-staged": "npx lint-staged",
    "postinstall": "prisma generate",
    "start": "SKIP_ENV_VALIDATION=true node --es-module-specifier-resolution=node .wss-dist/src/server/wss-server/prod.js | npm run log-agent -- --service=server | pino-pretty",
    "start:scheduler": "SKIP_ENV_VALIDATION=true node --es-module-specifier-resolution=node .wss-dist/src/server/scheduler/scheduler-entrypoint.js | npm run log-agent -- --service=scheduler | pino-pretty",
    "log-agent": "tsx --tsconfig tsconfig.log-agent.json src/log-agent.ts"
  },

start:scheduler is the process that I've tested this with

    "start:scheduler": "SKIP_ENV_VALIDATION=true node --es-module-specifier-resolution=node .wss-dist/src/server/scheduler/scheduler-entrypoint.js | npm run log-agent -- --service=scheduler | pino-pretty",

Ian WoodfillOP•2y ago

hahaha, that bad?

Brody•2y ago

thats a lot of text but im struggling to understand why youd want to outright stop your app, gracefully or not

Ian WoodfillOP•2y ago

Ah yeah I mean on deployment I want previous deployments to shutdown gracefully

Brody•2y ago

but why, theres a new deployment to handle requets, why do you care about the old deployment at that point

Ian WoodfillOP•2y ago

The service runs tasks that often take > 5 minutes, and aren't resumable tasks. The current configuration stops them mid-run, which breaks certain guarantees in our app. So the ideal would be to catch the SIGTERM, stop handling new tasks on that specific instance, and finish any ongoing tasks, then call process.exit() Just curious how you guys handle exits on your end, and if this kind of workflow is even possible. Does your removal logic allow for my deployments to define when they exit?

Brody•2y ago

(i dont work for railway) im guessing whatever runs to kills the docker containers force kills it once the new deployment is live, not respecting your gracefull shutdown handler. i will do some testing and get back to you on this

Ian WoodfillOP•2y ago

Oh wow, didnt realize that. Respect! Let me know what you learn 🙂

Brody•2y ago

i have returned with information

Brody•2y ago

Solution

Brody•2y ago

you can catch the kill signal, but you only get ~3 seconds of grace time before the container is force killed, indicated by the 3 dots and a missing exit message that never got printed

Ian WoodfillOP•2y ago

oh interesting... 3 seconds is a unfortunate amount for my situation, but good to know. what is the logic you used to catch the signal? i'm still not able to even catch it.

Brody•2y ago

i used golang

Ian WoodfillOP•2y ago

console.log('begin')

process.on('SIGTERM', () => {
    console.log('Shutting down')
    process.exit(0)
})


const nextTick = () => {
    console.log('tick')
    setTimeout(nextTick, 5000)
}

nextTick()

console.log('begin')

process.on('SIGTERM', () => {
    console.log('Shutting down')
    process.exit(0)
})


const nextTick = () => {
    console.log('tick')
    setTimeout(nextTick, 5000)
}

nextTick()

tick
tick
tick
tick
tick
npm ERR! path /app
npm ERR! command failed
npm ERR! signal SIGTERM
npm ERR! command sh -c -- node index.js
npm ERR! A complete log of this run can be found in:
npm ERR!     /root/.npm/_logs/2023-06-05T16_52_01_148Z-debug-0.log
tick

tick
tick
tick
tick
tick
npm ERR! path /app
npm ERR! command failed
npm ERR! signal SIGTERM
npm ERR! command sh -c -- node index.js
npm ERR! A complete log of this run can be found in:
npm ERR!     /root/.npm/_logs/2023-06-05T16_52_01_148Z-debug-0.log
tick

Brody•2y ago

package main

import (
    "fmt"
    "os"
    "os/signal"
    "syscall"
    "time"
)

func main() {
    fmt.Println("Hello World!")

    fmt.Println("Waiting for kill signal...")

    var sig = make(chan os.Signal, 1)
    signal.Notify(sig, os.Kill, os.Interrupt, syscall.SIGTERM, syscall.SIGINT)

    var receivedSignal = <-sig
    fmt.Fprintln(os.Stderr, "Received signal:", receivedSignal)

    fmt.Println("App killed, cleaning up..")

    var sleepTime = 30 * time.Second

    fmt.Println("Artificially pausing for", sleepTime)

    go func() {
        for range time.Tick(1 * time.Second) {
            fmt.Print(". ")
        }
    }()

    time.Sleep(sleepTime)
    fmt.Println()

    fmt.Println("Times up, exiting with status 0")
    os.Exit(0)
}

package main

import (
    "fmt"
    "os"
    "os/signal"
    "syscall"
    "time"
)

func main() {
    fmt.Println("Hello World!")

    fmt.Println("Waiting for kill signal...")

    var sig = make(chan os.Signal, 1)
    signal.Notify(sig, os.Kill, os.Interrupt, syscall.SIGTERM, syscall.SIGINT)

    var receivedSignal = <-sig
    fmt.Fprintln(os.Stderr, "Received signal:", receivedSignal)

    fmt.Println("App killed, cleaning up..")

    var sleepTime = 30 * time.Second

    fmt.Println("Artificially pausing for", sleepTime)

    go func() {
        for range time.Tick(1 * time.Second) {
            fmt.Print(". ")
        }
    }()

    time.Sleep(sleepTime)
    fmt.Println()

    fmt.Println("Times up, exiting with status 0")
    os.Exit(0)
}

Ian WoodfillOP•2y ago

and you're catching the SIGTERM?

Brody•2y ago

yes obviously this works on local, but on railway the containers get force killed. my recommendation would be to have a separate worker service, that way when you deploy a change to your main app the worker service is uneffected. i know thats not a perfect solution since eventually you would have to deploy some changes to the worker service in the future, you might want to then also employ a 3rd party work queuing framework and with all that said, railway is great but it cant cover every single users specfic usecase perfectly, so there may be other PAAS platforms that will wait for your app to exit on it own, or maybe your workload would even work better on a VPS

Ian WoodfillOP•2y ago

Yeah those are definitely good suggestions. We're definitely starting to hit the edges of what Railway is capable of doing at this point. Definitely not looking forward to the 10x increase in complexity from most other PaaS just for a couple small additional capabilities, though :/

Ian WoodfillOP•2y ago

Appreciate the help, Brody

Brody•2y ago

if you have any more questions id be happy to answer them (within reason) 🙂

Ian WoodfillOP•2y ago

Thanks! Have you found any good PaaS services that fit the niche of being incrementally more configurable than railway without going all in on like a barebones cloud? What do most people "graduate" to once they start hitting the limits of the platform?

Brody•2y ago

hey now i cant just go spouting out competitors lol, nice try though 🤣

Ian WoodfillOP•2y ago

fair enough lol. Would love to be able to configure that force kill timeout!

Brody•2y ago

I suspect this is done to combat the creation of a phantom container, so I don't know how much luck you will have with this on other platforms, because no providers wants containers running unchecked but never say never, and screw it, give fly.io a go

Ian WoodfillOP•2y ago

funny you should say that, just spent the last couple hours toying around with Fly.io. While it seems good, I honestly may just defer for the time being to batching and scheduling deploys during off-peak hours for now to avoid the headache of moving over for this specific issue.

Brody•2y ago

sounds good

Ian WoodfillOP•2y ago

Out of curiosity, why are you so active on the forum? A railway super fan?

Brody•2y ago

it's true, I do like railway a lot, but I also like helping people, everyone comes onto this platform with a different level of knowledge so I try to help out where I can

Ian WoodfillOP•2y ago

thats dope. appreciate you taking the time

Brody•2y ago

thank you 🙂

Hang out with other likeminded developers & talk about all things https://railway.app on the Railway Community Server.

25KMembers

View on Discord

Want results from more Discord servers?

Add your server