Cloudflare tunnels being unstable
So I don't know if I'm the only one experiencing this, but for the last week, tunnels have been REALLY unstable.
They would randomly stop serving requests, and all domains routed through them would end up timing out. For some users it would still work fine, but for others, it's just timeouts.
Restarting the tunnel fixes the issues again. Only thing I noticed is that the tunnel logs are filled with "Forcefully closed request" errors. I am experiencing this across two different Dedicated servers, so it's not an isolated issue.
(please ping me when replying, otherwise I'll miss it)
24 Replies
Restarting the tunnel fixes the issues again. Only thing I noticed is that the tunnel logs are filled with "Forcefully closed request" errors.What's the exact error? Something like
error="Incoming request ended abruptly: context canceled"
or something else? Every one of the outbound tunnel connections fails at the same time?
I am experiencing this across two different Dedicated servers, so it's not an isolated issue.Anything else common between them? same network? Which cloudflared version? (can find in zero trust portal by clicking on tunnel name as well) You could try
--protocol http2
argument before tunnel run, if running as a service modify the unit file. The default is QUIC which can sometimes have issues in some networks. Worth noting that http2 doesn't support some advanced WARP Proxying stuff like UDP/ICMP Proxying if you depend on that.Yes, that is the error. Both servers are in the same country, but different parts of the datacenter.
I've been using tunnels got long now without issues. This started last week, and continued today again.
Both tunnels run docker containers. The 2nd tunnel on the one server remains unaffected, it's just these two with high traffic.
I'll confirm the versions when I'm back at my pc
error="Incoming request ended abruptly: context canceled"is from clients cancelling requests or going over the cf timeout of 100s If it's happening a ton and with random spikes it almost sounds like your origin service is being overloaded, could be something else though. If it was happening to two tunnels servicing unrelated services with no obvious spikes that'd be weirder
Yeah this whole thing is weird, because in every case, the origin is still completely accessible and working when accessed via the direct IP, and this issue with the tunnel doesn't go away without a restart
Just confirmed, all my tunnels are still running 2023.4.1.
It's only the two tunnels that I know of that have these issues, since it's the two handling the most traffic. Both across different servers, but the other tunnels running on the same servers are unaffected, and during the time this happens, all the services served by the tunnel is still accessible with the direct IP and port.
The very first time this happened was on Friday, the 12th of July. During that time, I saw there was maintenance running, so I ignored it, thinking it was part of the maintenance.
On that day, our uptime monitor, did detect the services served by those tunnels as "timing out". I confirmed that accessing the domains, the page would eventually time out, or load a blank white page. Using the Direct IP worked.
I then asked several people in my discord to test the various endpoints, to see if they can access them, and 2/5 was able to access the services without issue. Another users from South Africa (same country as me, but different part of the country), could still access the services when I couldn't.
Later the afternoon I rebooted the docker containers, and everything worked again, until this morning. This time, the uptime monitors didn't detect it, but everything was timing out again. Restarting the tunnel containers, once again fixed the issue
2023.4.1.is like ~20 versions or over a year behind the current. Still weird for it to happen suddenly, but being super outdated might complicate things as well Otherwise I would try to grab more logs around when it happens, and any logs from your origin about response times during those times
I found a tunnel that is currently "offline", that is running a 2024 version.
Here is the output from the logs.
As in the other cases, the service is online, is responding, and is accessible via IP:PORT, just not via the tunnel
that's helpful, if all of the connections drop at once like that to more then one data center I would assume a network issue or something with quic. That explains your issue though, all of the outbound connections are dropping, that would certainly do it..
What could cause something like this? An issue on my side, and issue with my hosting provider, issue with their backbone, routing issue at CF?
Just weird that it started happening now.
Another thing I just noticed, when I got the timeout page for this tunnel just now, it's running through london, even though your Cape Town and Johannesburg datacenters are the closest to me, so why is it trying to route so far from home?
What could cause something like this? An issue on my side, and issue with my hosting provider, issue with their backbone, routing issue at CF?Any? Routing issues are more shared. I would guess a more local issue or something with quic though. I would try --protocol http2 like I said above, quic is a bit quirky
Another thing I just noticed, when I got the timeout page for this tunnel just now, it's running through london, even though your Cape Town and Johannesburg datacenters are the closest to me, so why is it trying to route so far from home?Routing is by plan (capacity/bandwidth cost) and what your ISP wants could look at https://cloudflare.manfredi.io/en/tools/datacenters/ and https://debug.chaika.me/?findColo=true and see where you get routed depending on plan, if any go local
Cloudflare Tools
Cloudflare Tools
A set of unofficial tools to check connectivity to the Cloudflare network.
Looks like I get LHR x3 and CPT x1 on the first link for my plan, and only LHR for my plan on the second link
what's your plan?
Free plan. We haven't made enough through donations yet to upgrade to pro
ah ok, yea your tunnel should be connecting locally if you have one there but the other side is free plan inbound routing
Makes sense.
I'll keep an eye on the tunnels to see if it happens again, if it does, I'll switch over to http and test it again. If that keeps happening still I would need to reach out to hetzner
falkenstein vm/dedi?
(asking which hetzner location/product, I have a few hetzner stuff using tunnels, was curious if I could see the same)
The two affected ones are dedis in falkenstein. 1 is in DC6 and the other in DC8
weird, this was from the one in falkenstein?
Yeah, the one in DC6
That's the one that has been hit the hardest with the issues
dme is moscow, my cloud falk instance goes fra
DC8 was a first today
well being routed to DME isn't great, that's a fair bit away and in a different country
you could try
--ha-connections 10
or any num to increase the edge connections and hopefully connect to something closer
my hetzner cloud vm is in DC12 looks like and connects to fraIs there some way I can test to see where it's connecting to?
yea it shows you
024-07-15T05:02:52Z INF Connection 60e1c4f6-2c46-4102-9ae4-493743eda7b8 registered connIndex=1 ip=x.x.x.x location=dme05 2024-07-15T05:02:52Z INF Connection 31a07bb2-8732-4c62-b8d1-b2eb0c743f85 registered connIndex=3 ip=x.x.x.x location=dme05 2024-07-15T05:02:53Z INF Connection c344d5e3-3e00-4042-a703-b6031d3bb1a6 registered connIndex=2 ip=x.x.x.x location=dme06see the location there? it's airport code followed by a number
Hmmm, so one of the tunnels I restarted earlier today, reconnected to 1x hel, and 2x dme
Interesting. The other one connected to 2x hel and 1x dme