I created a real-time server using
I created a real-time server using Durable Objects and WebSocket. The logs confirm that both the Worker and the DO are in the SJC data center. Pinging the Worker's domain shows a latency of 2ms, but when messages are returned via WebSocket, the latency is as high as 70ms. Is the communication latency between the Worker and the DO really that high?
42 Replies
When you say pinging the domain, do you literally mean
ping <the domain>
, or do you mean doing an HTTP request to some /ping
endpoint? Those are very different thingsping <the domain>
The HTTPS GET request to the worker takes 18ms.
And how long to the DO?
^ Note that for request to the DO, you don't want the first request cuz that'll cold start and DOs are heavier than Workers
The DO will send one packet to the client every second, containing the time obtained from performance.now(). Upon receiving the packet, the client will immediately send it back. The DO will then subtract the time in the packet from the current time obtained using performance.now() to calculate the round-trip latency. The observed latency is currently between 60-90ms.
I hope that users near the data center will have WebSocket latency below 50ms, as latency above 50ms results in a 3-frame input delay. If there are other solutions, I am open to trying them. Thank you for your support.
How are you advancing time exactly? WS send/receive are not async at JS level.
Are you creating a promise (when your DO sends) that resolves when your DO receives, and then timing the duration of that promise resolving?
Also what's one "packet" in this case?
let lagBuffer = new Uint16Array(5);
lagBuffer[0] = performance.now()&0xFFFF; packet like this
this is onmessage handler let lagInfo = new Uint16Array(data);
this.PLag[pos] = (performance.now()&0xFFFF) - lagInfo[0];
ws send is sync?
Also, is your DO handling any other requests at that time? Keep in mind they're single threaded, so if another request is running them wall time will definitely be higher
how to change async
ws.send()
in javascript is a synchronous method because you "receive" the response in your response handler
Well, let me rephrase, there's nothing to await
when you call ws.send()
.
It just returns void
It queues a message to be sent asynchronously
Could you format this? It's kind of hard to read, you can use backticks `The reason I bring up sync/async is because of https://developers.cloudflare.com/workers/runtime-apis/performance/#performancenow (specifically for perf timing)
Cloudflare Docs
Performance and timers | Cloudflare Workers docs
Measure timing, performance, and timing of subrequests and other operations.
thanks,i will to try
After trying, it seems there is no significant difference. Is there any way to achieve the lowest latency
What is your expected latency exactly? I think you need to test the latency of invoking a DOs
fetch()
handler (not the Worker in front of it) to get a better sense for latency to your DO. Also, when you test this, are you sending any other requests to the DO? It's important to keep in mind DO's are single threaded so if any other requests are being processed, we won't deliver the clients response to the DO until it is its turn to execute.A single fetch operation returns an object immediately within 5-12 milliseconds, which is very ideal. DO only handles lag tests continuously, with a single websocket and no other requests.
I want to develop a real-time websocket server that only has workers and DO, so there is no other backend.
So this
fetch()
you're timing from the client-side, right? Maybe try timing RTT of a WebSocket send from your client to the DO and back? I would be very surprised if a fetch takes 5-12 ms and a ws send takes 70ms, so I figure something else is going on in the DO that's preventing the event from being delivered immediately
Also, do you care about RTT if you're sending events from the server? Or do you just care about TTFB to the client?I am concerned about RTT (Round-Trip Time). A WebSocket game needs to maintain stable latency for tens of minutes.
I am trying to write a complete test case, please wait a moment.
I found the reason. The DO is in colo=IAD, while the worker is in colo=SJC. However, my test case can allocate to the same colo each time. Why is this happening? Does it have anything to do with the parameter of idFromName?
In the DO, I obtained the colo by using fetch("https://www.cloudflare.com/cdn-cgi/trace"). Is there any simpler way to get the DO's location? According to my tests, the latency within the same colo is less than 20ms.
this is my test case,and this is result
Was the DO initially created by a request in Eastern north america? That's the only thing that would explain why it's on the east coast.
no use location hint
What loaction did you provide?
If I use the above test case, both the DO and the worker are in SJC. However, if I use my business code, the DO is in IAD.
I hope that the business code can also be executed in SJC as much as possible each time.
If you don't provide a location hint, the first time you ever send a request to a DO it will be created as close to the Worker as possible
The only way we would've created the DO in IAD is if your first request was close to IAD, or you specifically provided a location hint
enam
Your DO will not move to another region on its own, you would need to make a new DO and try to create it closer to your clients.
BTW, is this WS latency or fetch? the 20 ms?Can I specify the creation of a DO in SJC or IAD, rather than it being random?
If both the worker and the DO are in SJC, the latency is less than 10ms.
The current situation is that request.cf.colo is always SJC, but the DO is still created in IAD. It never gets created in SJC.
You mean can you specify a colo rather than a region? No, and even if you could, the DOs would occasionally move out of the specified colo to another colo in the region if the "home" colo had issues (ex. network is down).
If I make a DO instance called
let id = stub.idFromName("some_new_name");
, and the DO gets created in IAD
, then for the rest of time the DO instance stub.idFromName("some_new_name")
will live in enam
.
It doesn't matter if you don't send a request for a year, it'll still map to the enam
region.
If you want a DO that's in SJC, you need to make another DO instance, ex. stub.idFromName("ANOTHER_new_name")
, don't provide a location hint, and it will be created close to the Worker that invoked it.
Does that help?If I redeploy using deploy, will it trigger the DO to be recreated?
I changed the name... and everything switched to SJC... yeah!
👍
It depends what you mean by recreated. If you mean "will my DO instances currently executing JS be evicted from memory?" the answer is yes, but then upon receiving new requests, those DO instances will run in probably the same colo, and definitely the same region.
If you mean "will my DOs be completely destroyed and then be able to run elsewhere in the world, like move from enam to oc" the answer is definitely not, since once an ID is created it will always run in that region.
Every time I deployed, I would get disconnected. I thought the DO should have been cleared as well.
I thought the DO should have been cleared as well.Hmm, what do you mean by "cleared"?
I see that the DO is no longer being billed. I believe that the next time it is created, it will choose the data center again.
Try sending another request to the previous DO with the same name from before, you'll see it will still run in Eastern North America.
Once you've created an ID, you've pinned that named instance of a DO to a region forever.
My issue has been resolved. Thank you for your helpful assistance.
😆
If I keep using new names to avoid pinning, is there a limit on the number of names?
No prob, hope the game development goes well 🙂 .
If I keep using new names to avoid pinning, is there a limit on the number of names?We don't limit the number of Durable Object "instances" https://developers.cloudflare.com/durable-objects/platform/limits/ You might actually want to use
newUniqueId()
(instead of idFromName()
) if each DO instance is just a single game session.Cloudflare Docs
Limits | Cloudflare Durable Objects docs
Durable Objects are only available on the Workers Paid plan. Durable Objects limits are the same as Workers Limits, as well as the following limits that are specific to Durable Objects:
:cloudflare: OK thanks!
👍
@Milan Is it only the DurableObject of the WebSocket Hibernation Server that incurs request count
I found that under a regular WebSocket, even if there are 10 messages round-trip per second, the request count does not increase.
I'll get back to you some time next week, we have a separate mechanism for billing websocket hibernation invocations but I'm not too familiar with it
This is actually a very good question, so thanks for asking it. I suspect regular websocket receives don't show up in the dashboard, rather, they would show up in GraphQL https://developers.cloudflare.com/durable-objects/observability/graphql-analytics/#websocket-metrics.
Hibernatable websockets are different, we're delivering events to top-level request handlers (
webSocketMessage/Close/Error()
), so they would show up in your dashboardCloudflare Docs
Metrics and GraphQL analytics | Cloudflare Durable Objects docs
Durable Objects expose analytics for Durable Object namespace-level and request-level metrics.
👌
durableObjectsPeriodicGroups(filter: {date_gt: "2024-09-13"}, limit: 1000) {
sum {
duration
activeTime
cpuTime
inboundWebsocketMsgCount
outboundWebsocketMsgCount
}
} Is the request fee calculated using inboundWebsocketMsgCount/20?
Should be, yeah
I think we only charge inbound messages though (for requests)? Be sure to double check our pricing page
@Milan Is there any way to get the remaining quota of current Durable Object requests
You mean the remaining quota until you start getting charged?
Hmmm I'm not sure, you should ask this in the main channel since someone there will probably know.
In the meantime I'm going to close this thread because it's moving away from the original topic, which has been resolved.