websockets not working with dns proxied or tunnels
does anybody else have the current iss where websockets dont always work when using dns proxied or tunnels?
https://github.com/Ylianst/MeshCentral/issues/5302
we have been at this issue for 3 weeks now, we havent changed any code with our software, the only common thing is cloudflare just doesnt like websockets being upgraded?
GitHub
Remote websocket connections take multiple attempts to connect when...
Describe the bug Clicking on the "Connect" button under "Desktop" or "Terminal" results in "Disconnected" approximately 9/10 times. Other times it will conne...
10 Replies
Up. We've been experiencing the same issue since 9th of January. DNS only also works for us, but not with proxy. We opened a support ticket, but no answer yet.
Same software? I poked into this way back then but was unable to repro, and of course websockets overall are ok, you're talking over one going through Cloudflare right now. Seems apparent though if it's just that one software they're doing something wrong with websockets
Not same software, we have a Heroku application with uvicorn / websockets. CNAME with proxy on Cloudflare. DNS only works fine if the request goes immediately to the heroku router. We started the investigation 3 weeks ago, from the frontend to the backend. I think we excluded everything, package upgrades, configs, etc. The only thing left is cloudfront. Today we were able to confirm that it works with DNS only.
That's interesting, you get consistent disconnects that happen constantly? If you had a minimal reproducible example that would be helpful. I assume you mean Cloudflare and not cloudfront, or are you using two CDNs back to back?
Yes we use cloudflare for DNS records and as a proxy server. The problem is that it is random. The only thing we see is that the browser exception:
Websocket connection to XYZ failed
Nothing in our logs, nothing in cloudflare. It is random which browser or when it happens. We use cloudfront as well, but for totally different purposes, the websocket traffix does not go through it.
Other then the issues mentioned in the github thread I can think of that we have just a Pro plan, not Business plan, and it does not guarantee 100% uptime.sure it doesn't but should be unrelated. There's never a guarantee of 100% uptime, there's SLAs (so you get something when it fails to hit that), sometimes they run experiments on free sites and such but it's all largely the same infrastructure between free/ent, only difference would be those experiments & ent having more points of presence it can hit
the meshcentral issue was pretty consistent from my understanding. CF does restart machines every so often so you may lose connection and have to reconnect every so often, but shouldn't fail on initial connect. Can you reproduce it minimally or only with your full app?
Yes, sure, I quoted the
100% uptime
from the official site:D (https://www.cloudflare.com/en-gb/plans/
)
In production we reverted back to DNS only with Heroku let's encrypt integration. Now it works for everyone. In staging we do not use proxy server, basically we have the same setup as now in production. It was one of the mistakes, the difference between staging and production hid the issue during testing.
Locally we were never able to reproduce it even though we use the same command / environment variables / configs to start the server locally.
While we had the proxy setup, we also tried these things on cloudflare:
- custom rule which uses strict SSL (it was recommended that full may be not enough)
- extended custom response headers with Connection: upgrade
and several things others recommended
No success. It was so random, we were unable to reproduce it for 2 weeks, it appeared only in the case of customers. No pattern, random browsers, devices. Also we tried to reproduce the issue with same os/browser + VPN. No success.
The comments in the github thread are really reflecting and similar to our problem. Nothing visible, but X% of the cases the websocket connection cannot even be initialised.
Now that the production is fixed, probably we will set up a test service where we can do experiments safely. And maybe together with the cloudflare support we can figure out something. The features of the proxy are necessary for us, so we do not want to let it go.
Just one addition, when I was able to reproduce it in production with this great tool (https://websocketking.com/
) from Microsoft Edge it was consistent. At the same time the connection worked perfectly in Chrome / Firefox / DuckDuckGo / Safari. Same tool, same time, same parameters. A few hours after, the same happened with Chrome, and in Edge it was working. And we repeated the same test several times. This is why I do not see any patterns.Was the any update on this? I am one if the developers and I'm happy to work with cloudflare to find out why this happens? We now see websocket connects ok, we can even send mouse movements ok! But when we try sending the jpeg over the websocket to our meshcentral servers just never get the jpegs? It's as is cloudflare look at the websocket data n go what's that? And just discard the data...
I am still waiting for them to respond to my support ticket.
Hello