Railway•14mo ago

service redeploy causes problems to private network

Hey there, I am using a nginx reverse proxy to proxy external requests to internal (non public) services over the railway private network. Everything is working fine until a service, that is proxied to, is redeployed. The proxy response after the redeploy is a 502 Bad Gateway . I tried some suggestions already that where mentioned in this channel but can't seem to get it working. Any help would be appreciated 🙂 This is my simplified nginx.conf:

server {
    listen 8080 default backlog=16384;
    listen [::]:8080 default backlog=16384;
    resolver [fd12::10] valid=10s;

    underscores_in_headers on;
    server_tokens off;
    absolute_redirect off;

    location /my-service/ {
        proxy_pass http://my-service.railway.internal:8080/;
    }

    location = / {
        default_type text/plain;
        add_header content-length "0";
        return 200 '';
    }
}

server {
    listen 8080 default backlog=16384;
    listen [::]:8080 default backlog=16384;
    resolver [fd12::10] valid=10s;

    underscores_in_headers on;
    server_tokens off;
    absolute_redirect off;

    location /my-service/ {
        proxy_pass http://my-service.railway.internal:8080/;
    }

    location = / {
        default_type text/plain;
        add_header content-length "0";
        return 200 '';
    }
}

Logs before redeploy:

192.168.16.8 - - [18/Oct/2023:14:20:20 +0000] "GET /my-service/ HTTP/1.1" 200 46 "-" "PostmanRuntime/7.33.0" "<my_ip>"

192.168.16.8 - - [18/Oct/2023:14:20:20 +0000] "GET /my-service/ HTTP/1.1" 200 46 "-" "PostmanRuntime/7.33.0" "<my_ip>"

Logs after redeploy:

2023/10/18 15:21:26 [error] 35#35: *1 connect() failed (101: Network is unreachable) while connecting to upstream, client: 192.168.16.8, server: , request: "GET /my-service/ HTTP/1.1", upstream: "http://[fd12:a877:fdc6:0:4000:1:79cf:2751]:8080/", host: "<host_name>"

192.168.16.8 - - [18/Oct/2023:15:21:26 +0000] "GET /my-service/ HTTP/1.1" 502 150 "-" "PostmanRuntime/7.33.0" "<my_ip>"

2023/10/18 15:21:26 [error] 35#35: *1 connect() failed (101: Network is unreachable) while connecting to upstream, client: 192.168.16.8, server: , request: "GET /my-service/ HTTP/1.1", upstream: "http://[fd12:a877:fdc6:0:4000:1:79cf:2751]:8080/", host: "<host_name>"

192.168.16.8 - - [18/Oct/2023:15:21:26 +0000] "GET /my-service/ HTTP/1.1" 502 150 "-" "PostmanRuntime/7.33.0" "<my_ip>"

Solution:

ill be around if you need any help, though i havent used forward auth myself, this should be a good starting point for you https://github.com/brody192/reverse-proxy...

Jump to solution

43 Replies

Percy•14mo ago

Project ID: 1eff5dc1-3495-488e-91ae-4becbfd85e1d

VisionAIOP•14mo ago

1eff5dc1-3495-488e-91ae-4becbfd85e1d

Brody•14mo ago

do you have a healthcheck setup on this "my-service" service?

VisionAIOP•14mo ago

currently not

Brody•14mo ago

without a health check railway won't know when the new deployment is ready to accept connects and ready to be swapped in, it would likely end up swapping in the service too early and then that's where you would get the 502 errors https://docs.railway.app/deploy/healthchecks

VisionAIOP•14mo ago

that makes sense thank you for the super quick reply 🙂 i will try it out right away

Brody•14mo ago

does this "my-service" service have a volume?

VisionAIOP•14mo ago

no but i have a few services that do

Brody•14mo ago

okay because even with a healthcheck, services with volumes will always have a deadtime to prevent two services reading/writing from the same volume (this is to prevent data corruption) but not applicable in this case since you said the service you are proxying to does not have a volume, just thought it would be good to mention

VisionAIOP•14mo ago

seems reasonable, thanks for the info 🙂

Brody•14mo ago

no problem, let me know if setting up a healthcheck on the my-service helps!

VisionAIOP•14mo ago

unfortunately this didn't fix my problem, i added a health check and i saw the health check succeed in the service deploy logs for "my-service".

Brody•14mo ago

for how long after are you seeing this 502?

VisionAIOP•14mo ago

the proxy made http 200 requests to the old instance for about 20-30 seconds, after that the 502 was returned (i guess when the new instance took place). The 502 takes about 1000ms.

Brody•14mo ago

try removing the valid=10s from the resolver directive, we want nginx to resolve a new ipv6 ip on every incoming request since the internal services are likely using dynamic ips I have a good feeling this wouldn't even be an issue if you used caddy as your proxy

VisionAIOP•14mo ago

i removed the valid=10s from the resolver, the problem is still there :/

Brody•14mo ago

okay im going to try with my caddy proxy setup

VisionAIOP•14mo ago

I might try caddy if it supports auth subrequests (that is one of the reasons i use nginx currently)

Brody•14mo ago

just tested, refreshed at half second intervals through a caddy proxy while the upstream proxy endpoint was deploying and it was a perfectly seemless switchover i think this is what you want? https://caddyserver.com/docs/caddyfile/directives/forward_auth

VisionAIOP•14mo ago

yes this seems to be it, i will try it out but it might take me a while (maybe until tomorrow)

Solution

Brody•14mo ago

ill be around if you need any help, though i havent used forward auth myself, this should be a good starting point for you https://github.com/brody192/reverse-proxy

Brody•14mo ago

this is the same thing i just used in my testing

VisionAIOP•14mo ago

thank you 😊

Brody•14mo ago

no problem!

acron•14mo ago

@Yanis

Brody•14mo ago

acron, do you know them?

acron•14mo ago

Err yeah was just tagging him because we're struggling with nginx reverse proxy as well :p

Brody•14mo ago

use caddy 🙂

acron•14mo ago

just checking out your project now 😉

VisionAIOP•14mo ago

caddy works indeed fine 👏 damn nginx 😂

Brody•14mo ago

nginx 👎

VisionAIOP•14mo ago

i used the same config you provided, i still need to solve the auth subrequest but this wasn't used in my nginx example anyways. should i mark this as done?

Brody•14mo ago

acron, yanis, if you need any help please open your own help thread 🙂

VisionAIOP•14mo ago

thanks for the help Brody 🙂

Brody•14mo ago

no problem!

VisionAIOP•14mo ago

Sorry to bother again and sorry that this is not directly related to Railway. Adopting my existing nginx proxy to a Caddy proxy created a small issue for me where I need to proxy to a non Railway service (Google Cloud Run in my case). This is only temporary and we plan to adopt it to Railway shortly. The nginx works on the Railway platform, the Caddy one unfortunately does not, even on my local machine. Proxying to the Google Cloud Run service produces the same error site, that a direct proxy to google.com produces (that's why I included it in the config example). Where you facing the same or a similar issue before?

{
    admin off
    persist_config off
    auto_https off

    log {
        format console
    }
    servers {
        trusted_proxies static private_ranges
    }
}

:{$PORT} {
    log {
        format console
    }

    handle_path /test/* {
        reverse_proxy https://google.com
    }
}

{
    admin off
    persist_config off
    auto_https off

    log {
        format console
    }
    servers {
        trusted_proxies static private_ranges
    }
}

:{$PORT} {
    log {
        format console
    }

    handle_path /test/* {
        reverse_proxy https://google.com
    }
}

Brody•14mo ago

what errors are you facing? @VisionAI 🙂 this may help, since you are proxying http to https

reverse_proxy https://example.com {
    header_up Host {upstream_hostport}
}

reverse_proxy https://example.com {
    header_up Host {upstream_hostport}
}

VisionAIOP•14mo ago

This was an example log to another external Railway service I use, even though the status indicates 200, no content is returned:

2023/10/19 15:30:31.573    INFO    http.log.access.log0    handled request    {"request": {"remote_ip": "192.168.16.6", "remote_port": "33152", "client_ip": "my-ip", "proto": "HTTP/1.1", "method": "GET", "host": "<exposed_service>.up.railway.app", "uri": "/test/", "headers": {"X-Forwarded-For": ["my-ip"], "X-Forwarded-Proto": ["https"], "X-Envoy-External-Address": ["my-ip"], "User-Agent": ["PostmanRuntime/7.33.0"], "Postman-Token": ["0352246b-7b5f-4744-8fe8-999679d67d67"], "Accept": ["*/*"], "Accept-Encoding": ["gzip, deflate, br"], "X-Request-Id": ["e36b35ce-d6cf-41f5-90ac-0cf6b3f24e39"]}}, "bytes_read": 0, "user_id": "", "duration": 0.003001039, "size": 0, "status": 200, "resp_headers": {"Server": ["Caddy", "railway"], "Date": ["Thu, 19 Oct 2023 15:30:31 GMT"], "Content-Length": ["0"]}}

2023/10/19 15:30:31.573    INFO    http.log.access.log0    handled request    {"request": {"remote_ip": "192.168.16.6", "remote_port": "33152", "client_ip": "my-ip", "proto": "HTTP/1.1", "method": "GET", "host": "<exposed_service>.up.railway.app", "uri": "/test/", "headers": {"X-Forwarded-For": ["my-ip"], "X-Forwarded-Proto": ["https"], "X-Envoy-External-Address": ["my-ip"], "User-Agent": ["PostmanRuntime/7.33.0"], "Postman-Token": ["0352246b-7b5f-4744-8fe8-999679d67d67"], "Accept": ["*/*"], "Accept-Encoding": ["gzip, deflate, br"], "X-Request-Id": ["e36b35ce-d6cf-41f5-90ac-0cf6b3f24e39"]}}, "bytes_read": 0, "user_id": "", "duration": 0.003001039, "size": 0, "status": 200, "resp_headers": {"Server": ["Caddy", "railway"], "Date": ["Thu, 19 Oct 2023 15:30:31 GMT"], "Content-Length": ["0"]}}

I will try this right now

Brody•14mo ago

hmmm I've seen that before unfortunately I forgot the fix so let me know if that caddyfile snippet i gave you does anything

VisionAIOP•14mo ago

Thanks this fixed my problem, I didn't even consider this could be the problem 🤦‍♂️

Brody•14mo ago

awesome (reading the caddy docs for the proxy directive goes a long way)

VisionAIOP•14mo ago

yep indeed a case of RTFM for me 😅

Brody•14mo ago

gotta love that acronym

Gaming

Programming

service redeploy causes problems to private network