Container can't download install.sh

My setup includes an AWS instance which runs coderd/provisionerd behind a proxy. A desktop on a separate network, which I have physical access to, is running Docker. All three are on a VPN network using tailscale, and I have the AWS instance set as a tailscale 'exit node' for the desktop. I configured two example templates, the docker, and the docker-code-server examples. In this configuration, I can access the coder instance from the browser at my public domain. I am able to log in and spin up the templates. I can log into the terminal on them. On the code-server example I can load the remote vs code instance in the browser. All of this works great. However, I noticed the docker example never stars the code-server instance. Digging a little deeper, the problem is, the docker container that runs on the desktop can't download files over https specifically from objects.githubusercontent.com. I can't even pull a certificate from it using openssl s_client. (I can ping it though). So the step to download install.sh is failing on the docker example. This only happens from the container on the desktop provisioned by terraform. I can access that site on the machine that runs the docker instances. I can also access it from the same docker container running on my desktop at home. Its only the container running on the desktop that can't. I also can pull a cert and download files from other https sites that I've tried. This behavior is consistent on both templates. I could just use a docker container with the code-server preinstalled, but this does not work for me since so much of the other tooling required is hosted on github. Any thoughts or debug tips that could help here?
No description
16 Replies
Phorcys
Phorcys3y ago
I would've told you to pre-install it in the image but since that's not an option I'm kind of out of ideas. I've never used this kind of setup, do you have a clue on what could be blocking the traffic ? also, does http traffic to http://objects.githubusercontent.com go through ?
WillToth
WillTothOP3y ago
Not entirely sure the right way to try it, but doing wget http://objects.githubusercontent.com/ correctly pulls in the 405 response, so I assume yes to compare, doing the same thing on the https just hangs (but does correctly resolve the IP) Actually it even says its connected, like its able to communicate out, but not get a response back in.
root@b19aded89bb1:/# wget https://objects.githubusercontent.com/
--2022-10-15 16:07:19-- https://objects.githubusercontent.com/
Resolving objects.githubusercontent.com (objects.githubusercontent.com)... 185.199.110.133, 185.199.111.133, 185.199.108.133, ...
Connecting to objects.githubusercontent.com (objects.githubusercontent.com)|185.199.110.133|:443... connected.
root@b19aded89bb1:/# wget https://objects.githubusercontent.com/
--2022-10-15 16:07:19-- https://objects.githubusercontent.com/
Resolving objects.githubusercontent.com (objects.githubusercontent.com)... 185.199.110.133, 185.199.111.133, 185.199.108.133, ...
Connecting to objects.githubusercontent.com (objects.githubusercontent.com)|185.199.110.133|:443... connected.
Though it never says 'HTTP request sent' so I guess its likely hanging while trying to establish a secure connection? This part is not my strong suit. Slightly related observation, running the same container on the same server with podman instead of docker works correctly.
Phorcys
Phorcys3y ago
can you show me your terraform config ?
WillToth
WillTothOP3y ago
Which makes me think it would have to be something with docker's networking, and potentially how it was configured with terraform.
Phorcys
Phorcys3y ago
yes exactly what i'm thinking i can't seem to think of a reason why though
WillToth
WillTothOP3y ago
So the first time I ran it, I used exactly the default template running coder templates createI ran the docker template first, which failed to start code-server on the first run. Then I ran the docker-code-server example. Finally, this is my setup that I've run later, which starts from the code-server template. https://github.com/FRC3005/infra-coder-templates/tree/main/frc-java
Phorcys
Phorcys3y ago
are you using windows to upload the templates?
WillToth
WillTothOP3y ago
Nope, I upload from the aws instance that is running coder.com infra, which is running the AWS Linux AMI
Phorcys
Phorcys3y ago
alright seems fine to me do some other websites work with https ?
WillToth
WillTothOP3y ago
Yes, every one I tried, even https://github.com
Phorcys
Phorcys3y ago
could you try using docker run with the same image and doing the wget inside there ? since you said it works with podman it might aswell work with default docker settings it looks like there's proxying of some sort being done though
WillToth
WillTothOP3y ago
So it fails even with the most basic test, trying it with docker run -it --rm ubuntu:latest Alright, I think I figured it out (at least got past one hurdle)! if I set mtu in docker to some smaller value, it gets further but still not where it needs to be. e.g. setting sudo vim /etc/docker/daemon.json
{
"mtu": 500
}
{
"mtu": 500
}
wget now works, but running openssl s_client hangs half way, so curl also still hangs. Now, looking at the mtu for various network adapters, I see the physical ethernet ports are 1500, but the tailscale mtu is set to 1280. So if I set the docker mtu to 1280 to match, it now works. No idea why this works on some sites and not others. For completeness, my ip a output
2: eno1: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc fq_codel state UP group default qlen 1000
link/ether 98:90:96:c4:b8:d7 brd ff:ff:ff:ff:ff:ff
altname enp0s25
inet 192.168.1.157/24 brd 192.168.1.255 scope global dynamic noprefixroute eno1
valid_lft 76649sec preferred_lft 76649sec
inet6 fe80::7d9e:28a6:4268:17ca/64 scope link noprefixroute
valid_lft forever preferred_lft forever
4: tailscale0: <POINTOPOINT,MULTICAST,NOARP,UP,LOWER_UP> mtu 1280 qdisc fq_codel state UNKNOWN group default qlen 500
link/none
inet 100.109.152.19/32 scope global tailscale0
valid_lft forever preferred_lft forever
inet6 fd7a:115c:a1e0:ab12:4843:cd96:626d:9813/128 scope global
valid_lft forever preferred_lft forever
inet6 fe80::bb31:8356:1bee:99d1/64 scope link stable-privacy
valid_lft forever preferred_lft forever
2: eno1: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc fq_codel state UP group default qlen 1000
link/ether 98:90:96:c4:b8:d7 brd ff:ff:ff:ff:ff:ff
altname enp0s25
inet 192.168.1.157/24 brd 192.168.1.255 scope global dynamic noprefixroute eno1
valid_lft 76649sec preferred_lft 76649sec
inet6 fe80::7d9e:28a6:4268:17ca/64 scope link noprefixroute
valid_lft forever preferred_lft forever
4: tailscale0: <POINTOPOINT,MULTICAST,NOARP,UP,LOWER_UP> mtu 1280 qdisc fq_codel state UNKNOWN group default qlen 500
link/none
inet 100.109.152.19/32 scope global tailscale0
valid_lft forever preferred_lft forever
inet6 fd7a:115c:a1e0:ab12:4843:cd96:626d:9813/128 scope global
valid_lft forever preferred_lft forever
inet6 fe80::bb31:8356:1bee:99d1/64 scope link stable-privacy
valid_lft forever preferred_lft forever
Phorcys
Phorcys3y ago
is this resolved ?
WillToth
WillTothOP3y ago
Yes it is. Thanks for the help!
Phorcys
Phorcys3y ago
great find, what a weird issue you can mark the issue as resolved by running /resolve
WillToth
WillTothOP3y ago
yeah weird issue for sure, marking complete
Want results from more Discord servers?
Add your server