Is sending to WebSockets from a DO fully blocking?

We have a simple DO that clients connect to via WS and we use hibernation. The DO has an endpoint that our backend can hit to send out a broadcast to all connected clients. Sometimes the trigger http call times out, even after 5 retries with backoff. My first attempt to fixing this was to change the trigger endpoint to just schedule an alarm, which then handles the sending, but that didn't help. I assume that is because the DO is single-threaded and so, if the alarm is running, all incoming requests are blocked. Today I noticed that this happened even though only ~10 connections were open, so the only explanation I have is that one of them is semi-broken, or just very slow, and that the send() call is fully blocking. Is that the case? If so, why? This seems to a unnecessarily strict, as sending a WS message could just put it into an outbound queue.
9 Replies
Martin 0x522E
Martin 0x522EOP14mo ago
@Helpflare who can help with this?
Milan
Milan13mo ago
Sending a websocket message does put it into a queue, it doesn't block. Your backend sends a broadcast request to your DO (the broadcast endpoint), and then you do what exactly? Get back all your websockets and send them each a message? wdym by "http call times out"? Are you able to confirm that any of the ws get the broadcasted message?
Martin 0x522E
Martin 0x522EOP13mo ago
Yes exactly, the endpoint just forwards the fetch to the DO and the DO then iterates over all the websockets and sends each the same message. And that fetch then times out while waiting for the response - which would just be an empty 200. So all of this should be basically instant, so I don't understand why we sometimes get timeouts As I can't reproduce it, I also don't know of some messages get out The only time I could reproduce it, I had just closed the websocket client, so I don't know if it WOULD have received the message. But as this case triggered the issue, I assumed it might either be blocking or at least that the fetch only returns when the DO is done with all WS messages Also the send() message sometimes throws an exception that the WS is already closed. How does that work if that method only gets queued @milan
Milan
Milan13mo ago
I'll check with the team but I'm fairly certain the response wouldn't be blocked by outbound messages waiting in the queue, that should just prevent the DO from being evicted. If you get an exception from send() that the WS was already closed that means ws.close() was called previously. IIRC it's possible that you called ws.close() and the client hasn't responded with its own close message, so we wouldn't remove it from the list of hibernatable websockets. In that case, you could get it back via getWebSockets() even though you can't send outbound messages anymore.
Martin 0x522E
Martin 0x522EOP13mo ago
Thanks, yeah the latter makes sense. Just the timeouts are a mystery to me and I don't know how else to debug
Milan
Milan13mo ago
I assume you're awaiting the fetch to the DO and that is what's timing out? Maybe wrap your ws.send() in a try...catch? Do you have a webSocketClose() handler that closes the DO's websocket?
Martin 0x522E
Martin 0x522EOP13mo ago
We are try catching and then just ignoring that error. We do have that handler, but we don't do anything with them
Martin 0x522E
Martin 0x522EOP13mo ago
This is the code. Had to rename from .ts
Milan
Milan13mo ago
You're not timing out when you call broadcast via alarm()? I don't know that we can do much without a repro here, you should be able to return a response before all messages are sent so send() wouldn't block. Something else must be happening, maybe it was caused by additional work running in the constructor?
Want results from more Discord servers?
Add your server