Deadlock in cache.put() - platform issue or am I doing something wrong?
The following worker code results in a deadlock (no response is ever sent to the client), both in a local Wrangler instance and in the Playground:
This is a minimized version of an issue we have with our worker on Production, the (presumably) same issue is causing some of our requests to never be answered, causing visitor frustration.
The issue seems to be triggered by a cache put() call happening while another put() call is waiting for the provided response to complete. As the cache keys are different, I can't see a good reason why the two calls should block each other, though.
My question is: does this code try to do something unsupported, or is this supposed to work? It feels like a bug in the worker implementation, but I may be missing something.
6 Replies
This variant is also interesting:
Here, I'm not waiting for the put() call to finish, but its presence still interferes with consuming the response. The message "about to call text()" appears on the console, but nothing else is logged, so the code is waiting forever for the body of the dummy response. (This is despite the response body being already in memory, and trivially short.)
It's definitely weird that even after cloning the response, the two copies can block each other.
Hm as far as I'm aware, you can't really use async functions inside
HTMLRewriter
handlers
You could still do something like
You can use async functions with HTMLRewriter as of https://blog.cloudflare.com/asynchronous-htmlrewriter-for-cloudflare-workers, but it seems this hits a runtime bug with something to do with cache: https://github.com/cloudflare/workerd/issues/2498
ctx.waitUntil
is probably the best approach for writing to cache and not hitting a deadlock inside of these handlers right now.fwiw just waitUntil isn't enough, if you take his second variant and ctx.waitUntil the cache.put it just gets stuck in the await dummyResponse.text(). If you clone it and then ctx.waitUntil at the end, seems to work regardless of amount of times it runs, weird stuff
oh wow, interesting. Good info, thank you
Hi! Thank you for taking a look at this
Yes, in our production code, I worked around the issue by collecting responses in an array and calling cache.put on them after the rewriter was done, but I assume it should work in parallel as well.
Looks like that linked issue is exactly the same thing I ran into, so I'm watching it