Socket response time >=150ms ?
I was under the impression that with Websockets and Durable Objects, we could get short response times, like <50ms. I'm just sending json objects. But I'm getting like 500ms to 1.5s.
I'm operating a live Q&A service.
Maybe I'm being dumb by sending the latest status of all of the questions when somebody upvotes a question or asks a new one so I can make sure we don't have weird consistency issues between clients, so I'm send around 1.7MB when there are ~400 questions. However, I deleted all the questions and tried it and I'm still gettting around the same timing.
I'm measuring the timing using K6 socket testing.
What could I be doing wrong? How can I make this more performant (<50ms or at least <100ms)?
@eeswar
22 Replies
Is there a way to see the max size of a websocket message? I'm getting an error that says
RangeError: Values cannot be larger than 131072 bytes
which is 2^17 bytes (what a weird number - why 17 and not 16, 15, 31, or 32)? Are we sending a websocket message that's too big? This error went away when we deleted all the data in the durable object (which means that the server is sending less info per message)WS message limits are here: https://developers.cloudflare.com/durable-objects/platform/limits/
Limits · Cloudflare Durable Objects docs
Durable Objects are only available on the Workers Paid plan.
Where are you based? Did you provide a location hint for the DO you created? https://developers.cloudflare.com/durable-objects/platform/data-location/
Further: how are you modeling your data here? What does each DO represent? You should be scaling horizontally (one DO per game/room/document/etc) where possible.
Data location · Cloudflare Durable Objects docs
You can restrict a Durable Object to a jurisdiction, or provide a location hint.
@eeswar can answer configuration questions
Based in the US
We have students across the country. He's currently in Cali, I'm currently in Texas. When we were testing y'day both of us were getting pretty much the same results despite being in different places and WiFis (Starbucks vs home)
An end user should only be connected to one Durable Object, right?
This is a storage error, you're going above the limit for value size in a (key, value) pair.
I thought a key can only be 512bytes not 131072. And a value is supposed to hold 25MB, which is also more than 2^17
Limits · Cloudflare Durable Objects docs
Durable Objects are only available on the Workers Paid plan.
Oh what. It's way smaller than for KV?
Wait so what does this actually apply to?
Storage is writing to a KV, right?
Workers KV & DO's Transactional Storage are two separate things.
You can write to Workers KV from a DO, but
this.state.storage
is the DO's Transactional Storage
https://developers.cloudflare.com/workers/platform/storage-options/
The 2^17 is because it's KiB, rather than KB.
128 KB is 128000 bytes, 128 KiB is 131072 bytes.
A key can be 2 KiB (2048 bytes)We basically host live office hours chats for various coding projects for our students across the country.
Each project has lets say 20 chapters.
A teacher needs to see all the questions for all the chapters so that they can answer.
A student may only need to see one chapter at a time I suppose.
Right now, we're creating a separate Durable Object for different projects (essentially a course).
Does it even make sense to section further?
I know what a Kibibyte is. Just didn't know where the # was coming from.
Relatedly, does the DO store need to be flat keys? Or can I have nested and access just one of the nested values?
It's hard to say if you should make it more granular, but generally the less you make individual DOs do the better. If you have people accessing 1 DO all across the US then some of them will have good latency and some won't. That said, 500ms+ sounds surprising to me...
If your values being written to storage are too large, you either need to split the value into multiple (key, value) pairs, or decrease the size of the value (maybe diffierent encoding/compression)?
Is that latency including reading everything from storage related to that project?
Since you mentioned you want to send the status of every question
Ya we were putting all the questions into one key called questions.
Must be contributing to latency too
Cause we have to json.parse the stringified value
Sounds like we should instead have something like
questions:<projectID>:<chapterID>:<questionID>
as the keys, and each stores the data for only 1 question? So then the only slow operation would be the first load for a user where we need to fetch all the relevant questions.Yeah, or if you are certain you will never go over the limit you could do
questions:<chapter ID>
and the value is all the questions for the chapter.
If you want performance you should make your storage more granular and only read what you need, otherwise you're going to be doing a ton of unnecessary IO.
You shouldn't need <projectID>
if you're doing 1 DO per project, since DO storage is unique to each DO instance.I think ideally we should only write the vast majority of operations and read only in the cases where somebody joins a session, and then I guess we'll have to read if somebody upvotes a question so that we can read the current upvotes list and append the user to that list?
Is the DO storing this stuff on disc or in memory? The docs say unlimited so I'm assuming on disc?
I'm not familiar with the architecture of your system or what needs to get displayed on the frontend for different users so I can't really help there, sorry 😅 . If you want lower latency, just be sure to send only what each client needs to see, i.e. if I upvote a question and you want everyone looking at the questions of a certain chapter to see the update, then only those students looking at that chapter need to receive the update. If I'm looking at chapter 5 I don't need to get updates for chapter 2. If later I open chapter 2, then I can read the current state from storage, which reflects all updates on the chapter anyways (and would be less than reading all questions from all chapters as is done now)
Yes DO storage is durable
https://blog.cloudflare.com/durable-objects-easy-fast-correct-choose-three
This may be helpful for storage fyi
The Cloudflare Blog
Durable Objects: Easy, Fast, Correct — Choose three
When multiple clients access the same storage concurrently, race conditions abound. Durable Objects can make it easier. We recently rolled out improvements to Durable Objects that automatically correct many common race conditions while actually making your code faster.
Ah good point that whenever someone goes to a new chapter they'll need to read all the questions
This mostly made sense but I'm confused about response gating. Is it only fast to do that because we're waiting to memory first before sending to user rather than to disc? Otherwise how's that any faster than awaiting the write
Actually pretty much everything else made sense there. I come from an embedded systems background so similar singlethreaded concurrency handling
That's a good question, my interpretation is that there's 2 benefits
1. You don't have to
await
explicitly, so subtle code mistakes on the application developers part won't break things
2. We will coalesce your writes into a single batch before your DO returns a response, which improves write performance
So if you do 5 writes in your DO, and you don't await any of them, then before you can return a response or open another outgoing connection, we'll flush it all to disk in one go. Should have better performance than 5 individual writes where you await each one
Also realized you probably won't get a notification unless I directly reply to one of your messages, my badyea that sounds like more of a batching advantage rather than a gating advantage. I think this a comment more on the order they're introduced in the article.
What you described definitely sounds like a performance benefit
Performance from batching, correctness from gating 😉
Hey do you know if there's a limit on outgoing message size? That platform limits table only lists a limit for incoming messages (1MiB)
There isn't a limit on outgoing message size afaik