C
Coder.comā€¢3y ago
Skerit

Agent token is invalid inside of workspace container

After finding a good workaround for my issue #3870 (unable open a terminal session, which was because my user inside the workspace containers was wrong), I've got a new problem šŸ˜¬ My workspace containers are complaining about an invalid agent token:
2022-09-05 16:20:46.013 [INFO] <./agent/agent.go:450> (*agent).init generating host key
2022-09-05 16:20:46.213 [INFO] <./agent/agent.go:141> (*agent).run connecting
2022-09-05 16:20:46.215 [WARN] <./agent/agent.go:150> (*agent).run failed to dial {"error": "GET https://coder.mydomain.be/api/v2/workspaceagents/me/metadata: unexpected status code 401: Agent token is invalid.: Try logging in using 'coder login \u003curl\u003e'."}
2022-09-05 16:20:46.013 [INFO] <./agent/agent.go:450> (*agent).init generating host key
2022-09-05 16:20:46.213 [INFO] <./agent/agent.go:141> (*agent).run connecting
2022-09-05 16:20:46.215 [WARN] <./agent/agent.go:150> (*agent).run failed to dial {"error": "GET https://coder.mydomain.be/api/v2/workspaceagents/me/metadata: unexpected status code 401: Agent token is invalid.: Try logging in using 'coder login \u003curl\u003e'."}
I deleted the existing workspaces & templates, restarted coder & made the templates & workspaces again, but it's still happening. Any thoughts on what the issue could be?
67 Replies
Phorcys
Phorcysā€¢3y ago
interesting one sounds like a bug to me I'll leave this one to a member of the team
maf
mafā€¢3y ago
This is a bit out-there, but is there a chance you could have two coder servers running? My thought is that one is issuing the workspace creation, and the workspace is contacting the other. Could you also show the contents of your agent token? I.e:
āÆ docker exec -it coder-mafredri-work /bin/bash
coder@work:/home/coder# declare -p CODER_AGENT_TOKEN
declare -x CODER_AGENT_TOKEN="abcdef00-aaaa-bbbb-cccc-7769498d6e70"
āÆ docker exec -it coder-mafredri-work /bin/bash
coder@work:/home/coder# declare -p CODER_AGENT_TOKEN
declare -x CODER_AGENT_TOKEN="abcdef00-aaaa-bbbb-cccc-7769498d6e70"
(Using declare -p here because it quotes the contents, so any whitespace/invisible characters would be visible.)
Skerit
SkeritOPā€¢3y ago
No, there's just one instance of coder server running (managed by systemd). Here's my ps output:
[root@kumulus skerit]# ps ax | grep coder
358888 ? Ssl 0:03 /usr/local/bin/coder server
358929 ? Ss 0:00 postgres: postgres coder 172.17.0.1(36320) idle
358930 ? Ss 0:00 postgres: postgres coder 172.17.0.1(36330) idle
358931 ? Ss 0:00 postgres: postgres coder 172.17.0.1(36346) idle
359396 ? Ssl 0:00 ./coder agent
359517 ? Ssl 0:00 ./coder agent --no-reap
359866 pts/1 S+ 0:00 grep coder
[root@kumulus skerit]# ps ax | grep coder
358888 ? Ssl 0:03 /usr/local/bin/coder server
358929 ? Ss 0:00 postgres: postgres coder 172.17.0.1(36320) idle
358930 ? Ss 0:00 postgres: postgres coder 172.17.0.1(36330) idle
358931 ? Ss 0:00 postgres: postgres coder 172.17.0.1(36346) idle
359396 ? Ssl 0:00 ./coder agent
359517 ? Ssl 0:00 ./coder agent --no-reap
359866 pts/1 S+ 0:00 grep coder
This is the token inside the container:
[coder@test7 ~]$ declare -p CODER_AGENT_TOKEN
declare -x CODER_AGENT_TOKEN="5147678f-3a41-426b-b6c7-e1c069415261"
[coder@test7 ~]$ declare -p CODER_AGENT_TOKEN
declare -x CODER_AGENT_TOKEN="5147678f-3a41-426b-b6c7-e1c069415261"
Also: every now and then I don't get the Agent token is invalid warning, but then I get a build is outdated one šŸ¤· Even if the template hasn't been updated at all.
maf
mafā€¢3y ago
That is very odd. Just to confirm, does you docker container creation date correspond with coder start of the workspace? (If you only created it and no stop/start, then does it correspond to coder create timestamp?)
docker inspect coder-mafredri-work | jq -r '.[].Created'
docker inspect coder-mafredri-work | jq -r '.[].Created'
Actually, the output of the following could be useful:
docker inspect coder-mafredri-work | jq '.[] | {Created: .Created, State: .State, Env: .Config.Env}'
docker inspect coder-mafredri-work | jq '.[] | {Created: .Created, State: .State, Env: .Config.Env}'
(Filter out sensitive information, if there is any.)
Skerit
SkeritOPā€¢3y ago
Oh, I'm doing all of this via the web interface. So I'm creating, deleting, starting, ... workspaces via that route. But yes, it does correspond to that time.
{
"Created": "2022-09-06T13:18:18.834215707Z",
"State": {
"Status": "running",
"Running": true,
"Paused": false,
"Restarting": false,
"OOMKilled": false,
"Dead": false,
"Pid": 359396,
"ExitCode": 0,
"Error": "",
"StartedAt": "2022-09-06T13:18:19.15860747Z",
"FinishedAt": "0001-01-01T00:00:00Z"
},
"Env": [
"CODER_AGENT_TOKEN=5147678f-3a41-426b-b6c7-e1c069415261",
"PATH=/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin",
"LANG=en_US.UTF-8",
"ENTRYPOINTD=/entrypoint.d",
"USER=coder"
]
}
{
"Created": "2022-09-06T13:18:18.834215707Z",
"State": {
"Status": "running",
"Running": true,
"Paused": false,
"Restarting": false,
"OOMKilled": false,
"Dead": false,
"Pid": 359396,
"ExitCode": 0,
"Error": "",
"StartedAt": "2022-09-06T13:18:19.15860747Z",
"FinishedAt": "0001-01-01T00:00:00Z"
},
"Env": [
"CODER_AGENT_TOKEN=5147678f-3a41-426b-b6c7-e1c069415261",
"PATH=/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin",
"LANG=en_US.UTF-8",
"ENTRYPOINTD=/entrypoint.d",
"USER=coder"
]
}
maf
mafā€¢3y ago
Thanks, that looks as it should Would you mind running the following query in your database and pasting/screenshotting the output?
with t as (select wa.id as waid, wb.workspace_id as wsid, wa.* from workspace_agents wa join workspace_resources wr on (wa.resource_id = wr.id) join workspace_builds wb on (wr.job_id = wb.job_id)) select distinct on (t1.created_at, t1.waid) t1.waid, t1.created_at, t1.updated_at, t1.first_connected_at, t1.last_connected_at, t1.disconnected_at, t1.auth_token, t1.version from t t1 join t t2 on (t1.waid != t2.waid and t1.wsid = t2.wsid) where t2.auth_token = '5147678f-3a41-426b-b6c7-e1c069415261' order by t1.created_at desc;
with t as (select wa.id as waid, wb.workspace_id as wsid, wa.* from workspace_agents wa join workspace_resources wr on (wa.resource_id = wr.id) join workspace_builds wb on (wr.job_id = wb.job_id)) select distinct on (t1.created_at, t1.waid) t1.waid, t1.created_at, t1.updated_at, t1.first_connected_at, t1.last_connected_at, t1.disconnected_at, t1.auth_token, t1.version from t t1 join t t2 on (t1.waid != t2.waid and t1.wsid = t2.wsid) where t2.auth_token = '5147678f-3a41-426b-b6c7-e1c069415261' order by t1.created_at desc;
(Note that sharing would expose your auth tokens, so using a throwaway workspace would be ideal.)
Skerit
SkeritOPā€¢3y ago
Sure, no problem. It's still a test template anyway šŸ™‚
Skerit
SkeritOPā€¢3y ago
No description
maf
mafā€¢3y ago
Interesting that the top row there hasn't reported the version šŸ¤”, I wonder if that's related somehow. Btw, I managed to reproduce the error by setting CODER_AGENT_AUTH=bad as an env variable for the agent. (The default is CODER_AGENT_AUTH=token.) It seems our bootstrap scripts set it like this:
provisionersdk/scripts/bootstrap_linux.sh
46:export CODER_AGENT_AUTH="${AUTH_TYPE}"
provisionersdk/scripts/bootstrap_linux.sh
46:export CODER_AGENT_AUTH="${AUTH_TYPE}"
AUTH_TYPE doesn't seem to be defined anywhere, though? But setting CODER_AGENT_AUTH to the empty string should be just fine (falls back to token).
Skerit
SkeritOPā€¢3y ago
Ok, so I'll add that to the template and try again
maf
mafā€¢3y ago
Sure, you can add it, but you shouldn't have to. šŸ¤”... What does the env command say inside the docker container? Is AUTH_TYPE set? If yes, it may be inheriting it from your env somehow.
Skerit
SkeritOPā€¢3y ago
Doesn't seem to have changed anything either:
=
+ command -v curl
+ curl -fsSL --compressed https://coder.mydomain.be/bin/coder-linux-amd64 -o coder
+ break
+ chmod +x coder
+ export CODER_AGENT_AUTH=token
+ CODER_AGENT_AUTH=token
+ export CODER_AGENT_URL=https://coder.mydomain.be/
+ CODER_AGENT_URL=https://coder.mydomain.be/
+ exec ./coder agent
2022-09-06 14:57:12.520 [INFO] <./cli/agent.go:63> workspaceAgent.func1 spawning reaper process
2022-09-06 14:57:12.585 [INFO] <./cli/agent.go:78> workspaceAgent.func1 starting agent {"url": "https://coder.mydomain.be/", "auth": "token", "version": "v0.8.11+cde036c"}
2022-09-06 14:57:12.610 [INFO] <./agent/agent.go:450> (*agent).init generating host key
2022-09-06 14:57:12.848 [INFO] <./agent/agent.go:141> (*agent).run connecting
2022-09-06 14:57:12.851 [WARN] <./agent/agent.go:150> (*agent).run failed to dial {"error": "GET https://coder.mydomain.be/api/v2/workspaceagents/me/metadata: unexpected status code 401: Agent token is invalid.: Try logging in using 'coder login \u003curl\u003e'."}
2022-09-06 14:57:13.051 [INFO] <./agent/agent.go:141> (*agent).run connecting
2022-09-06 14:57:13.053 [WARN] <./agent/agent.go:150> (*agent).run failed to dial {"error": "GET https://coder.mydomain.be/api/v2/workspaceagents/me/metadata: unexpected status code 401: Agent token is invalid.: Try logging in using 'coder login \u003curl\u003e'."}
2022-09-06 14:57:13.454 [INFO] <./agent/agent.go:141> (*agent).run connecting
2022-09-06 14:57:13.456 [WARN] <./agent/agent.go:150> (*agent).run failed to dial {"error": "GET https://coder.mydomain.be/api/v2/workspaceagents/me/metadata: unexpected status code 401: Agent token is invalid.: Try logging in using 'coder login \u003curl\u003e'."}
2022-09-06 14:57:14.257 [INFO] <./agent/agent.go:141> (*agent).run connecting
2022-09-06 14:57:14.260 [INFO] <./agent/agent.go:153> (*agent).run fetched metadata
=
+ command -v curl
+ curl -fsSL --compressed https://coder.mydomain.be/bin/coder-linux-amd64 -o coder
+ break
+ chmod +x coder
+ export CODER_AGENT_AUTH=token
+ CODER_AGENT_AUTH=token
+ export CODER_AGENT_URL=https://coder.mydomain.be/
+ CODER_AGENT_URL=https://coder.mydomain.be/
+ exec ./coder agent
2022-09-06 14:57:12.520 [INFO] <./cli/agent.go:63> workspaceAgent.func1 spawning reaper process
2022-09-06 14:57:12.585 [INFO] <./cli/agent.go:78> workspaceAgent.func1 starting agent {"url": "https://coder.mydomain.be/", "auth": "token", "version": "v0.8.11+cde036c"}
2022-09-06 14:57:12.610 [INFO] <./agent/agent.go:450> (*agent).init generating host key
2022-09-06 14:57:12.848 [INFO] <./agent/agent.go:141> (*agent).run connecting
2022-09-06 14:57:12.851 [WARN] <./agent/agent.go:150> (*agent).run failed to dial {"error": "GET https://coder.mydomain.be/api/v2/workspaceagents/me/metadata: unexpected status code 401: Agent token is invalid.: Try logging in using 'coder login \u003curl\u003e'."}
2022-09-06 14:57:13.051 [INFO] <./agent/agent.go:141> (*agent).run connecting
2022-09-06 14:57:13.053 [WARN] <./agent/agent.go:150> (*agent).run failed to dial {"error": "GET https://coder.mydomain.be/api/v2/workspaceagents/me/metadata: unexpected status code 401: Agent token is invalid.: Try logging in using 'coder login \u003curl\u003e'."}
2022-09-06 14:57:13.454 [INFO] <./agent/agent.go:141> (*agent).run connecting
2022-09-06 14:57:13.456 [WARN] <./agent/agent.go:150> (*agent).run failed to dial {"error": "GET https://coder.mydomain.be/api/v2/workspaceagents/me/metadata: unexpected status code 401: Agent token is invalid.: Try logging in using 'coder login \u003curl\u003e'."}
2022-09-06 14:57:14.257 [INFO] <./agent/agent.go:141> (*agent).run connecting
2022-09-06 14:57:14.260 [INFO] <./agent/agent.go:153> (*agent).run fetched metadata
This is the env output now:
HOSTNAME=test2
PWD=/home/coder
HOME=/home/coder
ENTRYPOINTD=/entrypoint.d
LANG=en_US.UTF-8
CODER_AGENT_AUTH=token
CODER_AGENT_TOKEN=79916bf2-892a-4f96-b32a-68ede9d5fc6b
TERM=xterm
USER=coder
SHLVL=1
PATH=/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin
_=/usr/sbin/env
HOSTNAME=test2
PWD=/home/coder
HOME=/home/coder
ENTRYPOINTD=/entrypoint.d
LANG=en_US.UTF-8
CODER_AGENT_AUTH=token
CODER_AGENT_TOKEN=79916bf2-892a-4f96-b32a-68ede9d5fc6b
TERM=xterm
USER=coder
SHLVL=1
PATH=/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin
_=/usr/sbin/env
maf
mafā€¢3y ago
I see this uses a new token (compared to earlier), did you delete/create a new or using another workspace?
Skerit
SkeritOPā€¢3y ago
Yes, I updated the template & created a new workspace from it
maf
mafā€¢3y ago
Ok Is your coder server behind a proxy, cloudflare, or something else? Wondering if it may be stripping out something.
with t as (select wa.id as waid, wb.workspace_id as wsid, wa.* from workspace_agents wa join workspace_resources wr on (wa.resource_id = wr.id) join workspace_builds wb on (wr.job_id = wb.job_id)) select distinct on (t1.created_at, t1.waid) t1.waid, t1.created_at, t1.updated_at, t1.first_connected_at, t1.last_connected_at, t1.disconnected_at, t1.auth_token, t1.version from t t1 join t t2 on (t1.wsid = t2.wsid) where t2.auth_token = '79916bf2-892a-4f96-b32a-68ede9d5fc6b' order by t1.created_at desc;
with t as (select wa.id as waid, wb.workspace_id as wsid, wa.* from workspace_agents wa join workspace_resources wr on (wa.resource_id = wr.id) join workspace_builds wb on (wr.job_id = wb.job_id)) select distinct on (t1.created_at, t1.waid) t1.waid, t1.created_at, t1.updated_at, t1.first_connected_at, t1.last_connected_at, t1.disconnected_at, t1.auth_token, t1.version from t t1 join t t2 on (t1.wsid = t2.wsid) where t2.auth_token = '79916bf2-892a-4f96-b32a-68ede9d5fc6b' order by t1.created_at desc;
Slightly modified the earlier query, want to see that the workspace is actually using the most recent token (ordered by desc, so the workspace should be using the auth_token from the top result).
Skerit
SkeritOPā€¢3y ago
The public address is behind a reverse proxy indeed. Though a pretty simple Node.js one (made with the http2-proxy package)
Skerit
SkeritOPā€¢3y ago
Here's the output. (Again, nothing in the version column)
No description
maf
mafā€¢3y ago
Ok, so the token is definitely correct. Then my suspicion would be that the proxy is filtering out some headers, or not applying them.
Skerit
SkeritOPā€¢3y ago
Hmm, I can take a look at that. Though last week when I first tried Coder this was still an Ubuntu 20.04 server, using the same proxy. I didn't have this issue then. (I also didn't have to use the user="coder:coder" in my template then)
maf
mafā€¢3y ago
Did the 20.04 server also have a user named skerit with ID 1000? That change in behavior could be related to the Docker version as well.
Skerit
SkeritOPā€¢3y ago
It did! Same username, same uid.
maf
mafā€¢3y ago
Ok, that's curious. You wouldn't happen to know if the Docker version was different as well? (I.e. which one it was and which one it's now.) But on the topic of that proxy, the authentication happens via a cookie named session_token, so you'd want to verify that is being passed along to the coder server.
Skerit
SkeritOPā€¢3y ago
It's the same docker version, but a different build. Makes sense since it's on Arch now: Ubuntu: Docker version 20.10.17, build 100c701 Arch: Docker version 20.10.17, build 100c70180f
maf
mafā€¢3y ago
Interesting. I think that's a separate issue though, could you open up another thread about it and we'll discuss it there? (I.e. the fact that you need to specify user.)
Skerit
SkeritOPā€¢3y ago
I could, but we did discuss this in https://github.com/coder/coder/issues/3870 a few days ago šŸ˜„ Want me to make a new discussion here for it still?
GitHub
Panic when trying to open terminal session Ā· Issue #3870 Ā· coder/co...
Coder version &amp; template I&#39;m using coder v0.8.11. The template in question is the docker-code-server example template. (The only thing I changed in it was the DNS setting) Error Whe...
maf
mafā€¢3y ago
Yup, feel free to link to that one if you open up a thread here. GitHub issues isn't really great for discussing/debugging so I think we could get further in understanding why it happened through a thread here on Discord. (Even though we understand the issue, kinda, I still have no idea why it happened to you and/or what's different on that Arch system to trigger it.)
Skerit
SkeritOPā€¢3y ago
Will do.
Skerit
SkeritOPā€¢3y ago
In the mean time, here's a screenshot of a proxy request in action. On the left is the incoming (HTTP2) request headers, on the right is the transformed request headers sent to coder-server:
No description
maf
mafā€¢3y ago
Thanks. So that session_token looks strange to me, it should be a plain UUID, I think? It kinda looks base64 encoded, but decoding it returns binary data. Could the proxy be base64 encoding the cookie values? Ah just realized that was a capture of the website traffic, could you do one for the agent?
Skerit
SkeritOPā€¢3y ago
Ah sure, hold on
Skerit
SkeritOPā€¢3y ago
No description
maf
mafā€¢3y ago
Ok, that looks correct. Although I'd want to verify still that coder server sees that exact same request as well. At what point are you logging it, is it all happening inside the proxy? But before we think about that, let's just verify that the auth token works. Sec. Could you try running this on the coder server host, or anywhere really where you can reach the coder server directly (all one line):
CODER_AGENT_TOKEN=79916bf2-892a-4f96-b32a-68ede9d5fc6b CODER_AGENT_URL=http://localhost:port/ coder agent
CODER_AGENT_TOKEN=79916bf2-892a-4f96-b32a-68ede9d5fc6b CODER_AGENT_URL=http://localhost:port/ coder agent
That might try to execute the startup script so be vary of that (e.g. don't run as root, perhaps) One more think I'd like to verify is that the coder server is Coder v0.8.11+cde036c too? (E.g. if you open up the webui, it'll be shown at the bottom.)
Skerit
SkeritOPā€¢3y ago
I am indeed logging it inside the Node.js proxy itself. I can run the command like this on the server host: CODER_AGENT_TOKEN=79916bf2-892a-4f96-b32a-68ede9d5fc6b CODER_AGENT_URL=http://127.0.0.1:3091/ coder agent Do you want the entire log output? (It does indeed mention this version: 2022-09-07 11:19:40.517 [INFO] <./cli/agent.go:78> workspaceAgent.func1 starting agent {"url": "http://127.0.0.1:3091/", "auth": "token", "version": "v0.8.11+cde036c"})
maf
mafā€¢3y ago
Do you want the entire log output?
Just want to know if it connected or ran into the same error as above
It does indeed mention this version
That's the agent -- good, does it say so for the server as well (in the webui)
Skerit
SkeritOPā€¢3y ago
Yes, the webui also says Coder v0.8.11+cde036c It does not give me any "agent token invalid" errors. It does fail to run the startup script, but that's probably expected?
2022-09-07 11:52:07.540 [WARN] <./agent/agent.go:170> (*agent).run.func1 agent script failed ...
"error": run:
github.com/coder/coder/agent.(*agent).runStartupScript
/home/runner/work/coder/coder/agent/agent.go:375
- exit status 1
2022-09-07 11:52:07.540 [WARN] <./agent/agent.go:170> (*agent).run.func1 agent script failed ...
"error": run:
github.com/coder/coder/agent.(*agent).runStartupScript
/home/runner/work/coder/coder/agent/agent.go:375
- exit status 1
maf
mafā€¢3y ago
Yes, definitely expected. Ok. So I'd say this confirms that the agent token works, but something goes wrong in-transit between the failing agent and the server.
Skerit
SkeritOPā€¢3y ago
Very strange, but what could it be? I also tested the command with the public address (so it goes through the proxy) and that also just works. I've also had it happen 1 or 2 times that it does not complain about an invalid token, but then it says the build is outdated. (When it's not)
maf
mafā€¢3y ago
Maybe it could be a fluke related to some workspace start/stop or update actions? I.e. a container that stayed behind. But I agree that's very strange. Would it be possible for you to try to mirror the current setup as much as possible, but removing the proxy from the equation?
Skerit
SkeritOPā€¢3y ago
Internally that would be pretty easy, I just have to change the access-url to the internal ip address & port right?
maf
mafā€¢3y ago
Yeah, so essentially you'd be updating your coder server configuration so that workspaces are given the local ip/port instead of the proxy. Then you'd try to create one.
Skerit
SkeritOPā€¢3y ago
Same result šŸ˜•
=
+ command -v curl
+ curl -fsSL --compressed http://192.168.50.2:3091/bin/coder-linux-amd64 -o coder
+ break
+ chmod +x coder
+ export CODER_AGENT_AUTH=token
+ CODER_AGENT_AUTH=token
+ export CODER_AGENT_URL=http://192.168.50.2:3091/
+ CODER_AGENT_URL=http://192.168.50.2:3091/
+ exec ./coder agent
2022-09-07 13:49:05.344 [INFO] <./cli/agent.go:63> workspaceAgent.func1 spawning reaper process
2022-09-07 13:49:05.409 [INFO] <./cli/agent.go:78> workspaceAgent.func1 starting agent {"url": "http://192.168.50.2:3091/", "auth": "token", "version": "v0.8.11+cde036c"}
2022-09-07 13:49:05.410 [INFO] <./agent/agent.go:450> (*agent).init generating host key
2022-09-07 13:49:05.692 [INFO] <./agent/agent.go:141> (*agent).run connecting
2022-09-07 13:49:05.693 [WARN] <./agent/agent.go:150> (*agent).run failed to dial {"error": "GET http://192.168.50.2:3091/api/v2/workspaceagents/me/metadata: unexpected status code 401: Agent token is invalid.: Try logging in using 'coder login \u003curl\u003e'."}
=
+ command -v curl
+ curl -fsSL --compressed http://192.168.50.2:3091/bin/coder-linux-amd64 -o coder
+ break
+ chmod +x coder
+ export CODER_AGENT_AUTH=token
+ CODER_AGENT_AUTH=token
+ export CODER_AGENT_URL=http://192.168.50.2:3091/
+ CODER_AGENT_URL=http://192.168.50.2:3091/
+ exec ./coder agent
2022-09-07 13:49:05.344 [INFO] <./cli/agent.go:63> workspaceAgent.func1 spawning reaper process
2022-09-07 13:49:05.409 [INFO] <./cli/agent.go:78> workspaceAgent.func1 starting agent {"url": "http://192.168.50.2:3091/", "auth": "token", "version": "v0.8.11+cde036c"}
2022-09-07 13:49:05.410 [INFO] <./agent/agent.go:450> (*agent).init generating host key
2022-09-07 13:49:05.692 [INFO] <./agent/agent.go:141> (*agent).run connecting
2022-09-07 13:49:05.693 [WARN] <./agent/agent.go:150> (*agent).run failed to dial {"error": "GET http://192.168.50.2:3091/api/v2/workspaceagents/me/metadata: unexpected status code 401: Agent token is invalid.: Try logging in using 'coder login \u003curl\u003e'."}
Skerit
SkeritOPā€¢3y ago
The auth token configured as the environment variable is 1ef6cb94-2bd2-4b20-99cf-113722f99a4c currently And I did the query again too:
No description
maf
mafā€¢3y ago
re: The discussion in you other thread. There wouldn't happen to be a load-balancer/multiple DBs involved? I.e. everything talks straight to the single docker container running postgres? (Just a thought since you mentioned the occasional build outdated issue.) Have you tried starting with a clean database? If you decide to try that, please take a backup/dump of the current one because if that helps, the issue could be in the db state which we'd want to analyze.
Skerit
SkeritOPā€¢3y ago
It's just a single docker container, no load balancing. And I've cleared the database a few times now, but I can try that again too.
maf
mafā€¢3y ago
Ok, then I'd say you don't need to. I'd expect once to be enough. Perhaps we should take a step back, since it seems like the problem is with the container running the agent (since it works in other scenarios). Would you mind sharing the full output of docker inspect coder-my-worksace? You can PM it to me if it contains anything sensitive or you can censor it manually.
Skerit
SkeritOPā€¢3y ago
No problem
maf
mafā€¢3y ago
Could you do one more confirm that this works when not run inside the container?
CODER_AGENT_TOKEN=1ef6cb94-2bd2-4b20-99cf-113722f99a4c CODER_AGENT_URL=http://192.168.50.2:3091/ coder agent
CODER_AGENT_TOKEN=1ef6cb94-2bd2-4b20-99cf-113722f99a4c CODER_AGENT_URL=http://192.168.50.2:3091/ coder agent
Skerit
SkeritOPā€¢3y ago
Yes, that works. Don't get the invalid token error.
maf
mafā€¢3y ago
Thanks for testing that. This issue has me mystified. I guess we've somewhat narrowed down the problem to the actual container. But I see nothing in its configuration that would cause a problem. It's nearly identical to what it'd look like for me. Is the container running on the same machine as the host (i.e. 192.168.50.2)? Or, are the non-container agent tests being performed on the same machine running the problematic containers?
Skerit
SkeritOPā€¢3y ago
Yes, everything is on the same machine. I actually haven't really worked with docker much before, it's the first time I've worked so much with it šŸ™‚ I did have Podman installed and tested that out, but I've removed it before I even tried coder.
maf
mafā€¢3y ago
Hehe, seems this has become a trial by fire for you then šŸ˜… Ok, I'd like for you to try one ugly workaround that just might "fix" the issue for you. Make the following change to you main.tf:
entrypoint = ["sh", "-c", join("\n", ["export CODER_AGENT_TOKEN=${coder_agent.main.token}", replace(coder_agent.main.init_script, "127.0.0.1", "host.docker.internal")])]
entrypoint = ["sh", "-c", join("\n", ["export CODER_AGENT_TOKEN=${coder_agent.main.token}", replace(coder_agent.main.init_script, "127.0.0.1", "host.docker.internal")])]
(I.e. just replace the current entrypoint with that.) That will add an explicit export for the token to the bootstrap script, my hunch is that env values are not being propagated into your entrypoint and as such, CODER_AGENT_TOKEN is left empty. Does the following print world? (Feel free to use any docker image, just picked alpine out of habit.)
docker run --rm --env HELLO=world --entrypoint /bin/sh alpine -c 'echo $HELLO'
docker run --rm --env HELLO=world --entrypoint /bin/sh alpine -c 'echo $HELLO'
Skerit
SkeritOPā€¢3y ago
That modified entrypoint also didn't fix it! šŸ˜• I downloaded the coder binary code-server was serving (wget http://192.168.50.2:3091/bin/coder-linux-amd64) and did the test in the console again (CODER_AGENT_TOKEN=599fc1ee-6809-4064-a9ba-7bca70e1e602 CODER_AGENT_URL=https://coder.kumulus.11ways.be/ ./coder-linux-amd64 agent) and that again did not have any problems. And yes, that test with the alpine image did echo world Could it be related to my user issue and somehow get the wrong env variables or something?
maf
mafā€¢3y ago
I won't way it's impossible, we don't really understand what's going on there either.
Skerit
SkeritOPā€¢3y ago
I see my docker is using the btrfs method...
maf
mafā€¢3y ago
Do you think that could be the source of the issue? I personally don't see how it would affect it though. Could you try the plain docker template, btw? Want to see if you have this same problem with other templates too. (No need to make any changes to it.)
Skerit
SkeritOPā€¢3y ago
Not sure, unless it's a snapshot issue. Funny thing is that this is all a default docker install (except for the dns setting) Looking at the documentation, I thought using the btrfs storage method required some manual changes. Sure, hold on
maf
mafā€¢3y ago
re: manual changes. Docker tries to detect the underlying filesystem and defaults to using that driver. So for instance if you have root on ZFS, it'd use the zfs driver. Probably the same with btrfs. You can define it in /etc/docker/daemon.json though, e.g. {"storage-driver": "overlay2"} and then restart the docker daemon (might need to wipe stuff from /var/lib/docker though.)
Skerit
SkeritOPā€¢3y ago
Well well well... the plain docker one works. (Didn't even have to add the user="coder:coder" bit)
maf
mafā€¢3y ago
Wow, that's even more confusing. šŸ˜… Would you mind trying plain vanilla docker-code-server template again? And if you need the fix again, instead of user="coder:coder", add HOME=/home/coder to env.
Skerit
SkeritOPā€¢3y ago
Sure First test failed, with the agent token error and the chdir/wrong HOME thing. Adding the HOME env var now. Adding the HOME env didn't fix it either. Huh. The /etc/passwd file in the container is the same as the one of the host server. It's a copy. Hmm, everything's the same. That container's root is basically the same as the host's root. Found someone else that had a similar issue... 6 years ago: https://github.com/moby/moby/issues/10216
maf
mafā€¢3y ago
Oh wow, that's crazy (nice find!) So I guess you could try this workaround https://github.com/moby/moby/issues/10216#issuecomment-196743892 or changing the storage driver to overlay2 šŸ‘†
Skerit
SkeritOPā€¢3y ago
I might post a little comment on there. Maybe it has something todo with a snapshot I made, restored & then rolled back again of the root drive.
maf
mafā€¢3y ago
I made an update on the ticket, does my conclusion seem accurate? https://github.com/coder/coder/issues/3870#issuecomment-1240444951 Ultimately it looks like both of your issues had the same root cause. Pretty bad bug in Docker or btrfs šŸ˜¬
Skerit
SkeritOPā€¢3y ago
Perfect. I was also wondering why (even though the root partition was all wrong) the Agent token is invalid thing kept happening... Guess we'll never really know ^^ Fyi: I deleted the image itself and recreated the template & workspace and now docker-code-server does work!
Phorcys
Phorcysā€¢3y ago
what a weird issue
maf
mafā€¢3y ago
šŸ‘. Yeah, it bugs me a little that we never got to fully understand the agent token issue, but who knows what was wrong and how much of the host environment was copied over to the container. It seemed like Docker was lying to us as well so I'm quite ready to shovel this into the "just btrfs things" pile šŸ˜…. Btw, thanks for putting up with all the testing @Skerit, I'm happy we got some answers after all that effort!
Skerit
SkeritOPā€¢3y ago
Thank you for guiding me through it! Many other projects would have given up and told me I was on my own šŸ˜„

Did you find this page helpful?