Looking for thoughts on the following
Looking for thoughts on the following thought process I'm having with DOs:
I have a number of users who will open a browser or similar client. They will pull a DO stub for their own user_id, connect to Websockets and listen for notifications
An admin user posts a new message, which pulls a list of all subscribed user_id'ss and generates a stub for each user, pulls all open Websockets, and sends the message (a single user may have multiple clients listening for notifications at once)
What's the best way to handle this pattern In theory? Each message being sent could be going out to any number of users (in the 10s of thousands for reasonable scale assumptions), is there a way to use subrequests or similar to be able to instantiate many stubs and send a message to many users? I don't wanna spend too long going down a particular experiment to find there's an existing solution, or if there's an architectural reason why this pattern should not be used
14 Replies
Or if I use Service Bindings am I all good?
The main constraint you will be working around is the 1000 subrequest limit. A system that works for up to ~1M users is to “nominate” some of the DOs to relay messages (you could use another class but this is slightly more efficient, although more complicated to reason about)
In practice:
- wherever your message comes in from: find the list of users that should be notified. Let’s assume you use ids or usernames, the important thing is that they are well below 1KB in length
- split them up into batches of up to 1000, depending on other subrequests like logs or maximum connections per user
- for each batch, find the first user and send the message and the id list to the DO
- in a DO, send the message to the user and send every other DO the same message with an empty list
To make it simpler, you can split the broadcast system into a separate DO, and you can make the fanout unlimited as well, if instead of the user id list you only send say the first and last id it should fan out to. Then, in the DO you can query the list, if the length is over 1000, split it up further, otherwise send the message
Unknown User•3w ago
Message Not Public
Sign In & Join Server To View
I will have a read, thank you
My current approach is this:
- Worker A, invoked by fetch(), will pull a list of IDs from a 3rd party (let's assume list is in thousands)
- the list of IDs is sliced into batches, and then pushed to a queue
- the consumer worker (which happens to be bound within this same worker) iterates over the list of IDs, and invokes a Service Binding RPC method (which is also bound to this same worker) called stubTest(id)
- stubTest() does a idFromName() and get() for a stub, then calls wsTest() on the stub
- wsTest() does this.state.getWebSockets() and iterates over each possible connection
Using Queues for this has seemed to work okay so far. If I have a list of 2000 IDs, my DO metrics shows 4K DO requests, and GB-sec usage of around 40GB-sec for those 2K IDs. The Workers metrics shows 2K requests, though I'm unsure if those fall under billed requests or not
While I understand the use case for using sharded DOs, is there anything about my process that seems to stand out? If I already had the calculated ID from idFromName(), instead of pulling it each time, would that half the number of DO requests?
Unknown User•3w ago
Message Not Public
Sign In & Join Server To View
Just not sure if that produces a request to the DO or is it the call to getWebSockets() that produces the second request per stub invocation?
IDs are from elsewhere, not necessarily in DO UUID format
Like in my testing I called a list of 2K IDs, and produced 4K DO requests in my dash. I assume the first request per stub is the creation of the stub, and the second is the call to getWebSockets()
Unknown User•3w ago
Message Not Public
Sign In & Join Server To View
I only call 1 function on each stub, I will need to investigate further
Unknown User•3w ago
Message Not Public
Sign In & Join Server To View
Admin-user was probably the wrong analogy
Main reason for using DO per user is because a user can be subscribed to an arbitrary number of channels/admins at any one time, which can and will be entirely different to any other user. Following this flow would allow a user to receive a message from these admins whenever they are sent out, on whatever clients that have open and connected to their DO instance. Kinda similar to a PubSub setup I guess
Notifications are also sent out very irregularly, which is why I wanted to simplify the architecture where possible
Unknown User•3w ago
Message Not Public
Sign In & Join Server To View
To simplify further, I'm essentially creating a push notification system where a channel will create a new notification that is sent out to all subscribed users. Users can be subscribed to any number of channels. The way my process is laid out means that scale is handled no worries. A channel might only send 1 notification every couple days, but a user might be subscribed to 200 channels so could expect a semi steady stream if they stayed connected. I will investigate the sharding solution, I am wondering if I would be better served by an existing solution from elsewhere. Pity PubSub is not taking anyone new for foreseeable future
Unknown User•3w ago
Message Not Public
Sign In & Join Server To View
I like the idea of hashing the userbase across DOs. A user might have multiple clients open at once, but usually a single user would be receiving messages per minutes, not seconds, so a single DO might be able to handle thousands of users at once, as only certain users will need to receive a message. Appreciate your input, great food for thought