nootrality
Application Failed to Respond (Deployment Overlap)
I'm getting application failed to respond errors during deployments. I have a healthcheck in place. I'm using
RAILWAY_DEPLOYMENT_OVERLAP_SECONDS
. What's happening is the load balancer is failing to transition over in time. I saw this issue a few months ago -- can we durably fix this?15 replies
Draining During Deployments
How long does Railway wait to drain old instances before tearing them down during deployments? We have requests that can take up to 60s to serve and want to make sure we're not dropping users during deployments. Is there any way to configure this?
27 replies
Deployments Failing on Health Check Timeout
All deployments are failing after 5 minutes. The # of healthcheck attempts never rises above 2. It's not clear whether this means the service never comes up (which would be surprising as the code hasn't really changed) or whether the healthcheck daemon process is dying. In either case we're blocked from all deployments and need help.
20 replies
Wildcard DNS
I'm trying to create wildcard DNS (e.g. *.example.com) so that user1.example.com, user2.example.com, etc goes to the same endpoint. This seems to work except the SSL certs, which aren't set at the root domain level. How can I set these up?
11 replies
CPU-Bound Performance
We have a (mostly) read-only vector search service running on AWS fargate currently with 4 arm instances running 4 vCPUs and 8gb of ram (about $460/mo). This service serves 200tps at peak.
When I ported into Railway I noticed a few things:
1) The service is only registering 16 logical CPUs on the container when I have a team plan and was expecting 32. Is it possible something is configured incorrectly?
2) In benchmarks I'm only realizing 30 tps vs the 200 I get with AWS. I'm maxing out at around 1500% CPU (which, given I'm not saturating the 4 instances on AWS seems comparable, though underutilized per (1)). Is there anything else I can configure to get comparable performance? This particular code is sensitive to instructions like SSE or AVX, is it possible that they're not enabled on the build machine but are on the prod hypervisors?
It's entirely possible that this workload is just more performant on AWS, but I figured it's worth reaching out to see if there's anything that can be adjusted.
4 replies