Screenshot 2025-04-14 at 03.56.10

So my Hyperdrive connection with MySQL just started throwing errors, or well it started not responding. https://screen.bouma.link/fGflgtK5X2vH5wrX98nZ From the Cloudflare Dashboard everything is "Active" (Hyperdrive) and "Healthy" (the Tunnel) and cloudflared also is runnign without any log output. But workers connecting throw:
Connection lost: The server closed the connection.
Connection lost: The server closed the connection.
Any clue where to start debugging this? The MySQL server is doing fine and nothing has changed on my end (I was asleep when the incident started). But I expected there to be some visible fault somewhere πŸ˜… Cloudflare side issue?
CleanShot Cloud
Screenshot 2025-04-14 at 03.56.10
Screenshot
30 Replies
AJR
AJRβ€’6d ago
Need your Hyperdrive ID, and I'll take a look in the morning. I assume direct connections with a different mysql client work correctly?
Alex
AlexOPβ€’6d ago
Yes, in fact I removed the worker route to let it fallback to my origin and everything is working again (where my origin is talking to the same database). Luckily the worker is just a "optimization" basically so removing it is fine. Thanks for looking into this, I'll be trying to catch some winks too so be back in a couple of hours πŸ™‚ The ID: e091675e42a94e789ab05718442dce6a
AJR
AJRβ€’6d ago
I also see all your metrics fall to 0 at that time. I see you're going across a tunnel. My first thing to check would be to restart the tunnel with loglevel=debug, to see if you're still successfully authing through there.
Alex
AlexOPβ€’6d ago
So restarting cloudflared did seem to help. Which is pretty unfortunate. The problem did came back pretty quick though within a few minutes. Now running under debug log level but so far no output other then the startup.
cloudflared --no-autoupdate tunnel --loglevel debug --log-directory /var/log/cloudflared run --token <REDACTED
cloudflared --no-autoupdate tunnel --loglevel debug --log-directory /var/log/cloudflared run --token <REDACTED
is running at the moment. Also opened teh connector diagnostics page in Zero Trust dashboard. No errors I can find there. Access analytics show no failed logins for the Hyperdrive application. It shows many successful authentication attempts. Database is still running perfectly and has ~45 connections left. No hyperdrive connections are made to it and workers is still throwing errors. Restarting cloudflared and re-deploying the worker seems to have no effect. Hopefully y'all will be able to tell me where the pain lies and what broke because I am at a loss πŸ˜… The worker has been running without issues since saturday morning and stopped working sunday evening. I have not deployed the worker or made any changes to the server config for the whole of sunday.
AJR
AJRβ€’6d ago
Ok, for now this is going into the category of "beta bug that we're still working to RCA". If it ends up being tunnel weirdness we'll figure that out, but I'm working from the assumption that this is a gap somewhere in how we're handling the wire protocol. With that said (feel free to answer in DMs if you're more comfortable with that): * Can you share as much as possible about your hosting, including specific MySQL version, on-prem vs PaaS, etc. * Can you share as much as possible about your queries/access patterns. Query examples, are you using transactions, etc etc. Thank you!
Alex
AlexOPβ€’6d ago
Interesting, okay cool. let me try and answer as much as possible, luckily it's IMHO a very simple setup which might help πŸ™‚ - I am using mysqld Ver 8.0.41-0ubuntu0.20.04.1 for Linux on x86_64 on a self-managed VPS not with a public cloud provider. It has both v4 and v6 internet connectivity. The tunnel is configured to talk to 127.0.0.1 :3306. I have configured it with a user with only select and show view privileges on a single database. - I run 2 queries in my worker. As far as I can tell the first query already fails (which means it doesn't execute the second. I am using Drizzle with the MySQL2 connector. According to Drizzles logs it executes:
Query: select `id`, `uuid`, `team_id`, `domain`, `with_links`, `is_default`, `target`, `include_path`, `include_query`, `redirect_default_not_found` from `custom_domains` where `custom_domains`.`domain` = ? limit ? -- params: ["example.com", 1]
Query: select `id`, `uuid`, `team_id`, `domain`, `with_links`, `is_default`, `target`, `include_path`, `include_query`, `redirect_default_not_found` from `custom_domains` where `custom_domains`.`domain` = ? limit ? -- params: ["example.com", 1]
I am happy to answer any specific questions and/or run some non-destructive commands or even give you access to the Hyperdrive if needed (since it's read-only it's no problem). But of course if we are arranging that we should move to DM's πŸ˜„
Alex
AlexOPβ€’5d ago
CleanShot Cloud
Screenshot 2025-04-15 at 13.18.08
Screenshot
Alex
AlexOPβ€’5d ago
"Magically" started working again
AJR
AJRβ€’5d ago
Man. I'm gonna have your ID memorized by May. I can tell.
Alex
AlexOPβ€’5d ago
Not sure that's a good thing... 🫣
AJR
AJRβ€’5d ago
We haven't released any changes since yesterday, to be clear.
Alex
AlexOPβ€’5d ago
Oh... I did do an deployment this morning
AJR
AJRβ€’5d ago
Worker or Hyperdrive?
Alex
AlexOPβ€’5d ago
Worker
AJR
AJRβ€’5d ago
Okay. That shouldn't interact with your Hyperdrive config at all, really. Just for context. I'm going to start with another run through of logs for you when I get to my desk this morning. I want to see how that all looks.
Alex
AlexOPβ€’5d ago
I ran yarn upgrade (not seeing mysql2 in there or other related libs from a quick glance) and I also lowered the compat date to 2025-04-02. No actual code changes. In case it matters. Let's see how long it keeps working this time then! Also not touched the MySQL server at all. So not even 3 hours from the looks of it. I did notice when looking at my MySQL server process list that when the first errors started rolling in there were 2 connections, and then 1 and now 0. It took a minute for requests to start consistently failing too. Guessing some of it was also the query cache. But now 100% failure rate again. In cloudflared logs I see:
{"level":"debug","event":1,"connIndex":0,"originService":"tcp://127.0.0.1:3306","ingressRule":0,"destAddr":"tcp://127.0.0.1:3306","time":"2025-04-15T11:39:48Z","message":"upstream->downstream copy: read tcp 127.0.0.1:42476->127.0.0.1:3306: use of closed network connection"}
{"level":"debug","event":1,"connIndex":0,"originService":"tcp://127.0.0.1:3306","ingressRule":0,"destAddr":"tcp://127.0.0.1:3306","time":"2025-04-15T11:39:48Z","message":"upstream->downstream copy: read tcp 127.0.0.1:42476->127.0.0.1:3306: use of closed network connection"}
Check MySQL values and wait_timeout is 8 hours. Not sure if other timeouts could be in play here which is what my first thought went to seeing this behaviour. I would still expect Hyperdrive to handle this and create a new connection but maybe it detects a max_connections situation wrongly here. But I am now assuming based on nothing... I'll let you do the actual root causing here!
AJR
AJRβ€’4d ago
Agreed at least that Hyperdrive is designed to drop bad connections and spin up a new one. That's a good angle to pursue also. Independent of why things fell out of sync somehow, why is it not detecting that and doing the obvious thing. I'll keep you posted Quick followup here. We're adding some additional robustness to the health checks and autorefresh behavior for MySQL connections. That'll go out in our next release, starting either today or tomorrow and done by Friday/Monday.
Alex
AlexOPβ€’4d ago
Hope to see stable service after that 🀘 Thanks for the update!
AJR
AJRβ€’3d ago
@Alex The release is out, we should be in a better spot for dropping/replacing bad connections for MySQL configs. Please let me know how it goes for you.
Alex
AlexOPβ€’3d ago
Very much going in the right direction! https://screen.bouma.link/TmpHFwkgHQv2KM6CDTsK
CleanShot Cloud
Screenshot 2025-04-17 at 22.11.25
Screenshot
Alex
AlexOPβ€’3d ago
Let's see how it holds up over the weekend! Currently seeing ~22 connection to MySQL, which is way more then before (don't think I've seen more then 2 before). So something is definitly better!
Alex
AlexOPβ€’2d ago
So no we are moving the other way 🀣 I have a 51 connection limit on my database, which should be way plenty but Hyperdrive is keeping 30+ connections idle for long times: https://screen.bouma.link/V2zQ00XFj5yCB9j0jYnH
CleanShot Cloud
Screenshot 2025-04-18 at 10.36.03
Screenshot
Alex
AlexOPβ€’2d ago
Some connection have been idle 4+ hours In addition it also had ~14 actively (within last 60s) connections That broke my app πŸ™ˆ And this time not just the worker
AJR
AJRβ€’2d ago
Well that's not supposed to happen. We drop idle connections after 15 minutes. Generally the way this should work is that it will aggressively open connections whenever all available ones are in use, up to 60. Anything that hasn't had traffic in 15 minutes should be disconnected, though I'm assuming you don't have any middleware in your stack that'll hold things open until it gets an explicit close message?
Alex
AlexOPβ€’2d ago
Since I am using Drizzle, I am not a 100% sure what it exactly is doing ofcourse. And I am not explicitely closing the connection to Hyperdrive either. But I also wouldn't expect a single instance of an isolate to live 4+ hours without any requests. I am at least not doing anything with the connection explicitly. I am even connecting in the fetch handler opposed to in the global scope.
AJR
AJRβ€’2d ago
Hyperdrive exists separate from the isolate. Couldn't have warm connections otherwise. But no, it should only live for 15 minutes without traffic I'm planning to bring this to the team, and we'll dig in starting today.
Alex
AlexOPβ€’2d ago
Happy to provide any details if that helps. I can also share the worker code if that helps.
knickish
knickishβ€’2d ago
I think we've found the root cause of this issue, will let you know here once we've confirmed that and released a fix for it. Thanks for your patience
Alex
AlexOPβ€’2d ago
No worries. Happy to β€œhelp” nail this down by breaking it.
AJR
AJRβ€’2d ago
No quotes needed, every problem you find is one less that everyone has to deal with. We very much appreciate it.

Did you find this page helpful?