Socket response time >=150ms ?

I was under the impression that with Websockets and Durable Objects, we could get short response times, like <50ms. I'm just sending json objects. But I'm getting like 500ms to 1.5s. I'm operating a live Q&A service. Maybe I'm being dumb by sending the latest status of all of the questions when somebody upvotes a question or asks a new one so I can make sure we don't have weird consistency issues between clients, so I'm send around 1.7MB when there are ~400 questions. However, I deleted all the questions and tried it and I'm still gettting around the same timing. I'm measuring the timing using K6 socket testing. What could I be doing wrong? How can I make this more performant (<50ms or at least <100ms)? @eeswar
22 Replies
Nikil
NikilOP•13mo ago
Is there a way to see the max size of a websocket message? I'm getting an error that says RangeError: Values cannot be larger than 131072 bytes which is 2^17 bytes (what a weird number - why 17 and not 16, 15, 31, or 32)? Are we sending a websocket message that's too big? This error went away when we deleted all the data in the durable object (which means that the server is sending less info per message)
elithrar
elithrar•13mo ago
Limits · Cloudflare Durable Objects docs
Durable Objects are only available on the Workers Paid plan.
elithrar
elithrar•13mo ago
Where are you based? Did you provide a location hint for the DO you created? https://developers.cloudflare.com/durable-objects/platform/data-location/ Further: how are you modeling your data here? What does each DO represent? You should be scaling horizontally (one DO per game/room/document/etc) where possible.
Data location · Cloudflare Durable Objects docs
You can restrict a Durable Object to a jurisdiction, or provide a location hint.
Nikil
NikilOP•13mo ago
@eeswar can answer configuration questions Based in the US We have students across the country. He's currently in Cali, I'm currently in Texas. When we were testing y'day both of us were getting pretty much the same results despite being in different places and WiFis (Starbucks vs home) An end user should only be connected to one Durable Object, right?
Milan
Milan•13mo ago
This is a storage error, you're going above the limit for value size in a (key, value) pair.
Nikil
NikilOP•13mo ago
I thought a key can only be 512bytes not 131072. And a value is supposed to hold 25MB, which is also more than 2^17
Milan
Milan•13mo ago
https://developers.cloudflare.com/durable-objects/platform/limits/
Key size 2 KiB(2048 bytes)
Value size 128 KiB (131072 bytes)
Key size 2 KiB(2048 bytes)
Value size 128 KiB (131072 bytes)
Limits · Cloudflare Durable Objects docs
Durable Objects are only available on the Workers Paid plan.
Nikil
NikilOP•13mo ago
Oh what. It's way smaller than for KV? Wait so what does this actually apply to? Storage is writing to a KV, right?
kian
kian•13mo ago
Workers KV & DO's Transactional Storage are two separate things. You can write to Workers KV from a DO, but this.state.storage is the DO's Transactional Storage https://developers.cloudflare.com/workers/platform/storage-options/ The 2^17 is because it's KiB, rather than KB. 128 KB is 128000 bytes, 128 KiB is 131072 bytes. A key can be 2 KiB (2048 bytes)
Nikil
NikilOP•13mo ago
We basically host live office hours chats for various coding projects for our students across the country. Each project has lets say 20 chapters. A teacher needs to see all the questions for all the chapters so that they can answer. A student may only need to see one chapter at a time I suppose. Right now, we're creating a separate Durable Object for different projects (essentially a course). Does it even make sense to section further? I know what a Kibibyte is. Just didn't know where the # was coming from. Relatedly, does the DO store need to be flat keys? Or can I have nested and access just one of the nested values?
Milan
Milan•13mo ago
It's hard to say if you should make it more granular, but generally the less you make individual DOs do the better. If you have people accessing 1 DO all across the US then some of them will have good latency and some won't. That said, 500ms+ sounds surprising to me... If your values being written to storage are too large, you either need to split the value into multiple (key, value) pairs, or decrease the size of the value (maybe diffierent encoding/compression)? Is that latency including reading everything from storage related to that project? Since you mentioned you want to send the status of every question
Nikil
NikilOP•13mo ago
Ya we were putting all the questions into one key called questions. Must be contributing to latency too Cause we have to json.parse the stringified value Sounds like we should instead have something like questions:<projectID>:<chapterID>:<questionID> as the keys, and each stores the data for only 1 question? So then the only slow operation would be the first load for a user where we need to fetch all the relevant questions.
Milan
Milan•13mo ago
Yeah, or if you are certain you will never go over the limit you could do questions:<chapter ID> and the value is all the questions for the chapter. If you want performance you should make your storage more granular and only read what you need, otherwise you're going to be doing a ton of unnecessary IO. You shouldn't need <projectID> if you're doing 1 DO per project, since DO storage is unique to each DO instance.
Nikil
NikilOP•13mo ago
I think ideally we should only write the vast majority of operations and read only in the cases where somebody joins a session, and then I guess we'll have to read if somebody upvotes a question so that we can read the current upvotes list and append the user to that list? Is the DO storing this stuff on disc or in memory? The docs say unlimited so I'm assuming on disc?
Milan
Milan•13mo ago
I'm not familiar with the architecture of your system or what needs to get displayed on the frontend for different users so I can't really help there, sorry 😅 . If you want lower latency, just be sure to send only what each client needs to see, i.e. if I upvote a question and you want everyone looking at the questions of a certain chapter to see the update, then only those students looking at that chapter need to receive the update. If I'm looking at chapter 5 I don't need to get updates for chapter 2. If later I open chapter 2, then I can read the current state from storage, which reflects all updates on the chapter anyways (and would be less than reading all questions from all chapters as is done now) Yes DO storage is durable
Milan
Milan•13mo ago
The Cloudflare Blog
Durable Objects: Easy, Fast, Correct — Choose three
When multiple clients access the same storage concurrently, race conditions abound. Durable Objects can make it easier. We recently rolled out improvements to Durable Objects that automatically correct many common race conditions while actually making your code faster.
Nikil
NikilOP•13mo ago
Ah good point that whenever someone goes to a new chapter they'll need to read all the questions This mostly made sense but I'm confused about response gating. Is it only fast to do that because we're waiting to memory first before sending to user rather than to disc? Otherwise how's that any faster than awaiting the write Actually pretty much everything else made sense there. I come from an embedded systems background so similar singlethreaded concurrency handling
Milan
Milan•13mo ago
That's a good question, my interpretation is that there's 2 benefits 1. You don't have to await explicitly, so subtle code mistakes on the application developers part won't break things 2. We will coalesce your writes into a single batch before your DO returns a response, which improves write performance So if you do 5 writes in your DO, and you don't await any of them, then before you can return a response or open another outgoing connection, we'll flush it all to disk in one go. Should have better performance than 5 individual writes where you await each one Also realized you probably won't get a notification unless I directly reply to one of your messages, my bad
Nikil
NikilOP•13mo ago
yea that sounds like more of a batching advantage rather than a gating advantage. I think this a comment more on the order they're introduced in the article. What you described definitely sounds like a performance benefit
Milan
Milan•13mo ago
Performance from batching, correctness from gating 😉
Nikil
NikilOP•13mo ago
Hey do you know if there's a limit on outgoing message size? That platform limits table only lists a limit for incoming messages (1MiB)
Milan
Milan•13mo ago
There isn't a limit on outgoing message size afaik

Did you find this page helpful?