Slow and unreliable workspace connections when running Coder on multiple kubernetes clusters
Hi all, we have the current setup of:
cluster A: Running coder server
cluster B: Provisioning target for Coder Workspaces, configured as outlined in this document.
In this configuration, Workspaces provisioned on Cluster B are so slow to access over Coder that they are unusable. However, if we provision workspaces on the same cluster that the Coder server is running on (Cluster A), workspaces perform as expected.
Both cluster A and B share the same L2 network.
We also ran through forcing websocket connections following the tutorial here. as a debug step.
Is this a supported configuration? Are there any of debug steps/information we should look at for diagnose this issue? Any help would be greatly appreciated 🙂
8 Replies
<#1336477926828408944>
Category
Help needed
Product
Coder (v2)
Platform
Linux
Logs
Please post any relevant logs/error messages.
Hitting the same issue when deploying workspaces cross cluster, everything works fine with in-cluster workspaces
Hi,
Could you describe ehat do you mean by unrelaible connections?
Is it for CLI connections? i.e.
coder shh <workspace>
or IDE connections like VSCode ?
Or web apps like code-server?(@dtatum)
Hi @Atif / @Phorcys I would describe it as such:
1. VSCode in-browser usually displays a blank screen, and sometimes eventually will display a 502
2. SSH (local with coder cli) terminal connections sometimes works, sometimes not.
3. SSH through web browser mostly works, but will drop the connection unexpectedly while using
As a note - we are using nginx-ingress through the Coder helm chart.
We don't see any of this behavior when both coder server and workspaces are on the same kubernetes cluster (either cluster A or B).

To add to this, sometimes the desktop VSCode would be able to connect but the latency between Coder Embedded Relay and workspace would spike up to 2-3000ms (I only have the screenshot where it over 200ms rn, as its just not connecting properly)

Hi, so it seems to me that there are networking problems/misconfigurations between your clusters that would be causing high delay (sometimes very high delay)
could you measure the delay between the two clusters just to get an idea?
to answer your questions about forcing WebSockets, it will work but cause a slower experience
is there any reason you've felt the need to do so? was there any issues with using DER??
this would make sense given the blank screen/502 and other issues
also, sorry for the delayed response, I had caught the flu and wasn't able to reply
I've had this problem before, high latency connection because Im using built in proxy.
you should try P2P connection directly to the host tho?