thelunz
CDCloudflare Developers
•Created by thelunz on 7/16/2024 in #workers-help
websocket-upgrade fetch from worker to DO randomly delayed
We are using DOs for a registry that coordinates the running of multi-user web sessions. We have "synchronizer" nodes that are external to the registry, each maintaining long-lived websockets into the registry for its housekeeping tasks.
A synchronizer watches for any of its socket connections dropping, or responding too sluggishly. In such cases, it automatically re-connects by sending a
For context: * the delays only rarely coincide with eviction and reload of DOs; generally the DOs are already active (i.e., no cold start involved). * there is no other significant traffic to our ingress or workers. How could we at least figure out where the time is going?
wss
request to our ingress worker, whose fetch
delegates to methods like this:
...where the "session runner" DO has a fetch
that boils down to:
Although the reconnections usually take of the order of 50ms, every few hours we hit periods when several synchronizers all detect a sluggish response and try to re-connect, and those reconnections are held up for a second or more before all completing at the same time. The worst cases have a delay of over 10 seconds.
The logs show that almost the entire delay occurs between the worker's console message, and the subsequent GET log line for the DO.For context: * the delays only rarely coincide with eviction and reload of DOs; generally the DOs are already active (i.e., no cold start involved). * there is no other significant traffic to our ingress or workers. How could we at least figure out where the time is going?
9 replies