Would it be possible to have the runtime
Would it be possible to have the runtime store only the latest N WebSockets, and then GC the rest?
16 Replies
This might be possible, though I don't know enough about if we can force JS to GC an object that still has references (I suspect no and also we shouldn't do that). Ex.
Req1
broadcasts to all currently connected websockets (got them via getWebSockets()
), then does an await and yields event loop. Req2...ReqN
create new websockets and those get used a bit, so we evict all WS prior to Req2
. When Req1
is done its IO and is scheduled to run again, when it tries to refer to its array of websockets it got from getWebSockets()
, something weird would happen
Would also have to consider different eviction policies (and if that should be configurable by developers)Wait, so if I getWebSockets(), use the WebSocket once, then move out of scope, is the WebSocket Object then garbage collected, or is it stored somewhere else too?
We have another reference to the JS WebSocket elsewhere
Can you track the active references within user code, then evict if they are only referenced in the backend?
I assume the backend can recover more easily from no longer having an active object reference
It's a good question, I was thinking the same (a lot of the C++ we wrote for hibernation does ref counting -- specifically knowing if we are holding the only remaining strong ref to some object). I'm not sure we have a way to know that for our JS types today, briefly skimming it looks like we don't. That might be something we could add, but between modifying refcounting + having some type of eviction promise/loop it would be a pretty significant project and would make the websocket/hibernation code (which is already complicated) hard to maintain.
We probably need to refactor the code + finish outgoing hibernation before considering something like this. Tinkering with object lifetimes is one of the more risky things we do on the runtime.
Inversely, what about a function on
state
that signals to the runtime that it is ok for the DO to be restarted? It would at least solve the growing number of objects problem, and might be useful for user code to be able to "clean up" if it generates a lot of in-memory stateOr even delete them when they are added, and only recreate them for the time they need to be processed?Typically folks will want to do some stuff with the object immediately after accepting it, so it's not clear at what point it would make sense to do this. I suspect if anything what you suggested before (evicting on the backend if there's no user references) makes most sense
Yeah sorry, I was kind of thinking about it the wrong way. There, I meant delete them after they are added(but no longer have references in user space).
It would also allow update-related restarts to occur at a time most useful for the program. It wouldn't entirely prevent forced-evictions, but it could help some, no?
signals to the runtime that it is ok for the DO to be restartedAs in, signals that after this current request is finished you can immediately evict the instance for hibernation?
That's come up before but my understanding is there are enough pitfalls that we have decided to punt it indefinitely
Just curious, if you are able to speak about what issues might occur?
Mostly just issues around deterministic behavior. It would be misleading to suggest that this runs when your DO is evicted because eviction could mean anything from: you breached a runtime limit to the machine no longer exists. It would really only be useful in the context of a shutdown due to inactivity. Not saying this isn't useful (I like destructors!) but it's fairly limited relative to what people would expect, and even in the clean shutdown case we have to consider things like
- how long it can run for
- can it do IO
- should we cancel shutdowns if new requests come in (does that mean your shutdown procedure has to be a transaction)
That all makes sense
To summarize, I'll open a ticket internally for evicting inactive hibernatable websockets and write some thoughts so it doesn't get lost here. If you're interested, feel free to open a discussion on the Workerd repo too
GitHub
Hibernation API Memory Management · cloudflare workerd · Discussion...
It was pointed out to me today by @MellowYarker that using the Hibernatable WebSocket API does not preclude running out of memory due to WebSocket objects not being automatically deleted when no lo...