Cloudflare Developers•2mo ago

Our Cloudflare Worker (backed by

Our Cloudflare Worker (backed by Hyperdrive) had a big spike in Errors and Wall Time today starting at around 10:30am PT today. On Hyperdrive, I don't see any spikes in latency, but I did see a couple errors in each of our Hyperdrive instances at ~10:30am PT. Struggling a little with how to debug or fix this - most traffic is fine, but our P999 wall time jumped to 70k ms. All of our backing databases are completely normal, and I'm able to query Hyperdrive normally locally

10 Replies

Adrian•2mo ago

Hey there do you currently still see this big spike in hyperdrive errors, or has it resolved? Also what errors do you see?

kunalOP•2mo ago

I just saw 1-3 errors in each of our Hyperdrive instances (and I'm not sure how to see what the errors were). On our Worker overall, we see these ongoing spikes in Wall Time that we can't really debug Client disconnected errors seem to be trending down but not 0

kunalOP•2mo ago

In case it's useful: - Account ID: cf4bd8e45a557fecf50a1b2af74b8453 - Worker name: spindl-adserver - Hyperdrives: 5dbce450a32a4279a8cdf3a8596ee308, 4d4fe23ab437464f86f59bb8ed897e88, dc0758eacd8945a4b2bea40cb1654223

Adrian•2mo ago

Thank you! i'll look into this and let you know what I find

kunalOP•2mo ago

One other thing I just enabled to try to see if I could get out of these infinite timeouts is setting a Postgres statement timeout: ALTER ROLE <role> SET statement_timeout='10s'; Though I think it will only kick in on new connections, not sure how often the connections are refreshed

Adrian•2mo ago

So far all I see is that some connection disconnects occurred which is nothing too out of the ordinary. I'll keep looking around to see if I notice anything else from hyperdrive side

kunalOP•2mo ago

Two observations from our side: 1. The high wall time spikes seems to correlate to Hyperdrive issues. We saw the same spike on Saturday when us-east-1 databases were broken from Hyperdrive. On that incident, we didn't see any changes in the Hyperdrive metrics (latency or error rate) but of course saw a lot of errors and walltime spikes 2. These client disconnect errors that we see what Hyperdrive fails don't trigger our Sentry alerts. I'm not sure why - it could be that the way that error presents itself, our in-worker Sentry integration doesn't trigger or doesn't get a chance to drain

Unknown User•2mo ago

Message Not Public

kunalOP•2mo ago

do you have a connection timeout configured for your database driver

Got it, thanks. Excited for the visibility work! On the connection timeout - Is this on the level of Postgres (like statement_timeout), or the client side library (we use the Node library 'pg')?

Unknown User•2mo ago

Message Not Public

Gaming

Programming

Our Cloudflare Worker (backed by

Did you find this page helpful?