Coder.com•3y ago

Agent token is invalid inside of workspace container

After finding a good workaround for my issue #3870 (unable open a terminal session, which was because my user inside the workspace containers was wrong), I've got a new problem 😬 My workspace containers are complaining about an invalid agent token:

2022-09-05 16:20:46.013 [INFO]    <./agent/agent.go:450>    (*agent).init    generating host key
2022-09-05 16:20:46.213 [INFO]    <./agent/agent.go:141>    (*agent).run    connecting
2022-09-05 16:20:46.215 [WARN]    <./agent/agent.go:150>    (*agent).run    failed to dial    {"error": "GET https://coder.mydomain.be/api/v2/workspaceagents/me/metadata: unexpected status code 401: Agent token is invalid.: Try logging in using 'coder login \u003curl\u003e'."}

2022-09-05 16:20:46.013 [INFO]    <./agent/agent.go:450>    (*agent).init    generating host key
2022-09-05 16:20:46.213 [INFO]    <./agent/agent.go:141>    (*agent).run    connecting
2022-09-05 16:20:46.215 [WARN]    <./agent/agent.go:150>    (*agent).run    failed to dial    {"error": "GET https://coder.mydomain.be/api/v2/workspaceagents/me/metadata: unexpected status code 401: Agent token is invalid.: Try logging in using 'coder login \u003curl\u003e'."}

I deleted the existing workspaces & templates, restarted coder & made the templates & workspaces again, but it's still happening. Any thoughts on what the issue could be?

67 Replies

Phorcys•3y ago

interesting one sounds like a bug to me I'll leave this one to a member of the team

maf•3y ago

This is a bit out-there, but is there a chance you could have two coder servers running? My thought is that one is issuing the workspace creation, and the workspace is contacting the other. Could you also show the contents of your agent token? I.e:

❯ docker exec -it coder-mafredri-work /bin/bash
coder@work:/home/coder# declare -p CODER_AGENT_TOKEN
declare -x CODER_AGENT_TOKEN="abcdef00-aaaa-bbbb-cccc-7769498d6e70"

❯ docker exec -it coder-mafredri-work /bin/bash
coder@work:/home/coder# declare -p CODER_AGENT_TOKEN
declare -x CODER_AGENT_TOKEN="abcdef00-aaaa-bbbb-cccc-7769498d6e70"

(Using declare -p here because it quotes the contents, so any whitespace/invisible characters would be visible.)

SkeritOP•3y ago

No, there's just one instance of coder server running (managed by systemd). Here's my ps output:

[root@kumulus skerit]# ps ax | grep coder
 358888 ?        Ssl    0:03 /usr/local/bin/coder server
 358929 ?        Ss     0:00 postgres: postgres coder 172.17.0.1(36320) idle
 358930 ?        Ss     0:00 postgres: postgres coder 172.17.0.1(36330) idle
 358931 ?        Ss     0:00 postgres: postgres coder 172.17.0.1(36346) idle
 359396 ?        Ssl    0:00 ./coder agent
 359517 ?        Ssl    0:00 ./coder agent --no-reap
 359866 pts/1    S+     0:00 grep coder

[root@kumulus skerit]# ps ax | grep coder
 358888 ?        Ssl    0:03 /usr/local/bin/coder server
 358929 ?        Ss     0:00 postgres: postgres coder 172.17.0.1(36320) idle
 358930 ?        Ss     0:00 postgres: postgres coder 172.17.0.1(36330) idle
 358931 ?        Ss     0:00 postgres: postgres coder 172.17.0.1(36346) idle
 359396 ?        Ssl    0:00 ./coder agent
 359517 ?        Ssl    0:00 ./coder agent --no-reap
 359866 pts/1    S+     0:00 grep coder

This is the token inside the container:

[coder@test7 ~]$ declare -p CODER_AGENT_TOKEN
declare -x CODER_AGENT_TOKEN="5147678f-3a41-426b-b6c7-e1c069415261"

[coder@test7 ~]$ declare -p CODER_AGENT_TOKEN
declare -x CODER_AGENT_TOKEN="5147678f-3a41-426b-b6c7-e1c069415261"

Also: every now and then I don't get the Agent token is invalid warning, but then I get a build is outdated one 🤷 Even if the template hasn't been updated at all.

maf•3y ago

That is very odd. Just to confirm, does you docker container creation date correspond with coder start of the workspace? (If you only created it and no stop/start, then does it correspond to coder create timestamp?)

docker inspect coder-mafredri-work | jq -r '.[].Created'

docker inspect coder-mafredri-work | jq -r '.[].Created'

Actually, the output of the following could be useful:

docker inspect coder-mafredri-work | jq '.[] | {Created: .Created, State: .State, Env: .Config.Env}'

docker inspect coder-mafredri-work | jq '.[] | {Created: .Created, State: .State, Env: .Config.Env}'

(Filter out sensitive information, if there is any.)

SkeritOP•3y ago

Oh, I'm doing all of this via the web interface. So I'm creating, deleting, starting, ... workspaces via that route. But yes, it does correspond to that time.

{
  "Created": "2022-09-06T13:18:18.834215707Z",
  "State": {
    "Status": "running",
    "Running": true,
    "Paused": false,
    "Restarting": false,
    "OOMKilled": false,
    "Dead": false,
    "Pid": 359396,
    "ExitCode": 0,
    "Error": "",
    "StartedAt": "2022-09-06T13:18:19.15860747Z",
    "FinishedAt": "0001-01-01T00:00:00Z"
  },
  "Env": [
    "CODER_AGENT_TOKEN=5147678f-3a41-426b-b6c7-e1c069415261",
    "PATH=/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin",
    "LANG=en_US.UTF-8",
    "ENTRYPOINTD=/entrypoint.d",
    "USER=coder"
  ]
}

{
  "Created": "2022-09-06T13:18:18.834215707Z",
  "State": {
    "Status": "running",
    "Running": true,
    "Paused": false,
    "Restarting": false,
    "OOMKilled": false,
    "Dead": false,
    "Pid": 359396,
    "ExitCode": 0,
    "Error": "",
    "StartedAt": "2022-09-06T13:18:19.15860747Z",
    "FinishedAt": "0001-01-01T00:00:00Z"
  },
  "Env": [
    "CODER_AGENT_TOKEN=5147678f-3a41-426b-b6c7-e1c069415261",
    "PATH=/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin",
    "LANG=en_US.UTF-8",
    "ENTRYPOINTD=/entrypoint.d",
    "USER=coder"
  ]
}

maf•3y ago

Thanks, that looks as it should Would you mind running the following query in your database and pasting/screenshotting the output?

with t as (select wa.id as waid, wb.workspace_id as wsid, wa.* from workspace_agents wa join workspace_resources wr on (wa.resource_id = wr.id) join workspace_builds wb on (wr.job_id = wb.job_id)) select distinct on (t1.created_at, t1.waid) t1.waid, t1.created_at, t1.updated_at, t1.first_connected_at, t1.last_connected_at, t1.disconnected_at, t1.auth_token, t1.version from t t1 join t t2 on (t1.waid != t2.waid and t1.wsid = t2.wsid) where t2.auth_token = '5147678f-3a41-426b-b6c7-e1c069415261' order by t1.created_at desc;

with t as (select wa.id as waid, wb.workspace_id as wsid, wa.* from workspace_agents wa join workspace_resources wr on (wa.resource_id = wr.id) join workspace_builds wb on (wr.job_id = wb.job_id)) select distinct on (t1.created_at, t1.waid) t1.waid, t1.created_at, t1.updated_at, t1.first_connected_at, t1.last_connected_at, t1.disconnected_at, t1.auth_token, t1.version from t t1 join t t2 on (t1.waid != t2.waid and t1.wsid = t2.wsid) where t2.auth_token = '5147678f-3a41-426b-b6c7-e1c069415261' order by t1.created_at desc;

(Note that sharing would expose your auth tokens, so using a throwaway workspace would be ideal.)

SkeritOP•3y ago

Sure, no problem. It's still a test template anyway 🙂

SkeritOP•3y ago

maf•3y ago

Interesting that the top row there hasn't reported the version 🤔, I wonder if that's related somehow. Btw, I managed to reproduce the error by setting CODER_AGENT_AUTH=bad as an env variable for the agent. (The default is CODER_AGENT_AUTH=token.) It seems our bootstrap scripts set it like this:

provisionersdk/scripts/bootstrap_linux.sh
46:export CODER_AGENT_AUTH="${AUTH_TYPE}"

provisionersdk/scripts/bootstrap_linux.sh
46:export CODER_AGENT_AUTH="${AUTH_TYPE}"

AUTH_TYPE doesn't seem to be defined anywhere, though? But setting CODER_AGENT_AUTH to the empty string should be just fine (falls back to token).

SkeritOP•3y ago

Ok, so I'll add that to the template and try again

maf•3y ago

Sure, you can add it, but you shouldn't have to. 🤔... What does the env command say inside the docker container? Is AUTH_TYPE set? If yes, it may be inheriting it from your env somehow.

SkeritOP•3y ago

Doesn't seem to have changed anything either:

=
+ command -v curl
+ curl -fsSL --compressed https://coder.mydomain.be/bin/coder-linux-amd64 -o coder
+ break
+ chmod +x coder
+ export CODER_AGENT_AUTH=token
+ CODER_AGENT_AUTH=token
+ export CODER_AGENT_URL=https://coder.mydomain.be/
+ CODER_AGENT_URL=https://coder.mydomain.be/
+ exec ./coder agent
2022-09-06 14:57:12.520 [INFO]    <./cli/agent.go:63>    workspaceAgent.func1    spawning reaper process
2022-09-06 14:57:12.585 [INFO]    <./cli/agent.go:78>    workspaceAgent.func1    starting agent    {"url": "https://coder.mydomain.be/", "auth": "token", "version": "v0.8.11+cde036c"}
2022-09-06 14:57:12.610 [INFO]    <./agent/agent.go:450>    (*agent).init    generating host key
2022-09-06 14:57:12.848 [INFO]    <./agent/agent.go:141>    (*agent).run    connecting
2022-09-06 14:57:12.851 [WARN]    <./agent/agent.go:150>    (*agent).run    failed to dial    {"error": "GET https://coder.mydomain.be/api/v2/workspaceagents/me/metadata: unexpected status code 401: Agent token is invalid.: Try logging in using 'coder login \u003curl\u003e'."}
2022-09-06 14:57:13.051 [INFO]    <./agent/agent.go:141>    (*agent).run    connecting
2022-09-06 14:57:13.053 [WARN]    <./agent/agent.go:150>    (*agent).run    failed to dial    {"error": "GET https://coder.mydomain.be/api/v2/workspaceagents/me/metadata: unexpected status code 401: Agent token is invalid.: Try logging in using 'coder login \u003curl\u003e'."}
2022-09-06 14:57:13.454 [INFO]    <./agent/agent.go:141>    (*agent).run    connecting
2022-09-06 14:57:13.456 [WARN]    <./agent/agent.go:150>    (*agent).run    failed to dial    {"error": "GET https://coder.mydomain.be/api/v2/workspaceagents/me/metadata: unexpected status code 401: Agent token is invalid.: Try logging in using 'coder login \u003curl\u003e'."}
2022-09-06 14:57:14.257 [INFO]    <./agent/agent.go:141>    (*agent).run    connecting
2022-09-06 14:57:14.260 [INFO]    <./agent/agent.go:153>    (*agent).run    fetched metadata

=
+ command -v curl
+ curl -fsSL --compressed https://coder.mydomain.be/bin/coder-linux-amd64 -o coder
+ break
+ chmod +x coder
+ export CODER_AGENT_AUTH=token
+ CODER_AGENT_AUTH=token
+ export CODER_AGENT_URL=https://coder.mydomain.be/
+ CODER_AGENT_URL=https://coder.mydomain.be/
+ exec ./coder agent
2022-09-06 14:57:12.520 [INFO]    <./cli/agent.go:63>    workspaceAgent.func1    spawning reaper process
2022-09-06 14:57:12.585 [INFO]    <./cli/agent.go:78>    workspaceAgent.func1    starting agent    {"url": "https://coder.mydomain.be/", "auth": "token", "version": "v0.8.11+cde036c"}
2022-09-06 14:57:12.610 [INFO]    <./agent/agent.go:450>    (*agent).init    generating host key
2022-09-06 14:57:12.848 [INFO]    <./agent/agent.go:141>    (*agent).run    connecting
2022-09-06 14:57:12.851 [WARN]    <./agent/agent.go:150>    (*agent).run    failed to dial    {"error": "GET https://coder.mydomain.be/api/v2/workspaceagents/me/metadata: unexpected status code 401: Agent token is invalid.: Try logging in using 'coder login \u003curl\u003e'."}
2022-09-06 14:57:13.051 [INFO]    <./agent/agent.go:141>    (*agent).run    connecting
2022-09-06 14:57:13.053 [WARN]    <./agent/agent.go:150>    (*agent).run    failed to dial    {"error": "GET https://coder.mydomain.be/api/v2/workspaceagents/me/metadata: unexpected status code 401: Agent token is invalid.: Try logging in using 'coder login \u003curl\u003e'."}
2022-09-06 14:57:13.454 [INFO]    <./agent/agent.go:141>    (*agent).run    connecting
2022-09-06 14:57:13.456 [WARN]    <./agent/agent.go:150>    (*agent).run    failed to dial    {"error": "GET https://coder.mydomain.be/api/v2/workspaceagents/me/metadata: unexpected status code 401: Agent token is invalid.: Try logging in using 'coder login \u003curl\u003e'."}
2022-09-06 14:57:14.257 [INFO]    <./agent/agent.go:141>    (*agent).run    connecting
2022-09-06 14:57:14.260 [INFO]    <./agent/agent.go:153>    (*agent).run    fetched metadata

This is the env output now:

HOSTNAME=test2
PWD=/home/coder
HOME=/home/coder
ENTRYPOINTD=/entrypoint.d
LANG=en_US.UTF-8
CODER_AGENT_AUTH=token
CODER_AGENT_TOKEN=79916bf2-892a-4f96-b32a-68ede9d5fc6b
TERM=xterm
USER=coder
SHLVL=1
PATH=/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin
_=/usr/sbin/env

HOSTNAME=test2
PWD=/home/coder
HOME=/home/coder
ENTRYPOINTD=/entrypoint.d
LANG=en_US.UTF-8
CODER_AGENT_AUTH=token
CODER_AGENT_TOKEN=79916bf2-892a-4f96-b32a-68ede9d5fc6b
TERM=xterm
USER=coder
SHLVL=1
PATH=/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin
_=/usr/sbin/env

maf•3y ago

I see this uses a new token (compared to earlier), did you delete/create a new or using another workspace?

SkeritOP•3y ago

Yes, I updated the template & created a new workspace from it

maf•3y ago

Ok Is your coder server behind a proxy, cloudflare, or something else? Wondering if it may be stripping out something.

with t as (select wa.id as waid, wb.workspace_id as wsid, wa.* from workspace_agents wa join workspace_resources wr on (wa.resource_id = wr.id) join workspace_builds wb on (wr.job_id = wb.job_id)) select distinct on (t1.created_at, t1.waid) t1.waid, t1.created_at, t1.updated_at, t1.first_connected_at, t1.last_connected_at, t1.disconnected_at, t1.auth_token, t1.version from t t1 join t t2 on (t1.wsid = t2.wsid) where t2.auth_token = '79916bf2-892a-4f96-b32a-68ede9d5fc6b' order by t1.created_at desc;

with t as (select wa.id as waid, wb.workspace_id as wsid, wa.* from workspace_agents wa join workspace_resources wr on (wa.resource_id = wr.id) join workspace_builds wb on (wr.job_id = wb.job_id)) select distinct on (t1.created_at, t1.waid) t1.waid, t1.created_at, t1.updated_at, t1.first_connected_at, t1.last_connected_at, t1.disconnected_at, t1.auth_token, t1.version from t t1 join t t2 on (t1.wsid = t2.wsid) where t2.auth_token = '79916bf2-892a-4f96-b32a-68ede9d5fc6b' order by t1.created_at desc;

Slightly modified the earlier query, want to see that the workspace is actually using the most recent token (ordered by desc, so the workspace should be using the auth_token from the top result).

SkeritOP•3y ago

The public address is behind a reverse proxy indeed. Though a pretty simple Node.js one (made with the http2-proxy package)

SkeritOP•3y ago

Here's the output. (Again, nothing in the version column)

maf•3y ago

Ok, so the token is definitely correct. Then my suspicion would be that the proxy is filtering out some headers, or not applying them.

SkeritOP•3y ago

Hmm, I can take a look at that. Though last week when I first tried Coder this was still an Ubuntu 20.04 server, using the same proxy. I didn't have this issue then. (I also didn't have to use the user="coder:coder" in my template then)

maf•3y ago

Did the 20.04 server also have a user named skerit with ID 1000? That change in behavior could be related to the Docker version as well.

SkeritOP•3y ago

It did! Same username, same uid.

maf•3y ago

Ok, that's curious. You wouldn't happen to know if the Docker version was different as well? (I.e. which one it was and which one it's now.) But on the topic of that proxy, the authentication happens via a cookie named session_token, so you'd want to verify that is being passed along to the coder server.

SkeritOP•3y ago

It's the same docker version, but a different build. Makes sense since it's on Arch now: Ubuntu: Docker version 20.10.17, build 100c701 Arch: Docker version 20.10.17, build 100c70180f

maf•3y ago

Interesting. I think that's a separate issue though, could you open up another thread about it and we'll discuss it there? (I.e. the fact that you need to specify user.)

SkeritOP•3y ago

I could, but we did discuss this in https://github.com/coder/coder/issues/3870 a few days ago 😄 Want me to make a new discussion here for it still?

GitHub

Panic when trying to open terminal session · Issue #3870 · coder/co...

Coder version & template I'm using coder v0.8.11. The template in question is the docker-code-server example template. (The only thing I changed in it was the DNS setting) Error Whe...

maf•3y ago

Yup, feel free to link to that one if you open up a thread here. GitHub issues isn't really great for discussing/debugging so I think we could get further in understanding why it happened through a thread here on Discord. (Even though we understand the issue, kinda, I still have no idea why it happened to you and/or what's different on that Arch system to trigger it.)

SkeritOP•3y ago

Will do.

SkeritOP•3y ago

In the mean time, here's a screenshot of a proxy request in action. On the left is the incoming (HTTP2) request headers, on the right is the transformed request headers sent to coder-server:

maf•3y ago

Thanks. So that session_token looks strange to me, it should be a plain UUID, I think? It kinda looks base64 encoded, but decoding it returns binary data. Could the proxy be base64 encoding the cookie values? Ah just realized that was a capture of the website traffic, could you do one for the agent?

SkeritOP•3y ago

Ah sure, hold on

SkeritOP•3y ago

maf•3y ago

Ok, that looks correct. Although I'd want to verify still that coder server sees that exact same request as well. At what point are you logging it, is it all happening inside the proxy? But before we think about that, let's just verify that the auth token works. Sec. Could you try running this on the coder server host, or anywhere really where you can reach the coder server directly (all one line):

CODER_AGENT_TOKEN=79916bf2-892a-4f96-b32a-68ede9d5fc6b CODER_AGENT_URL=http://localhost:port/ coder agent

CODER_AGENT_TOKEN=79916bf2-892a-4f96-b32a-68ede9d5fc6b CODER_AGENT_URL=http://localhost:port/ coder agent

That might try to execute the startup script so be vary of that (e.g. don't run as root, perhaps) One more think I'd like to verify is that the coder server is Coder v0.8.11+cde036c too? (E.g. if you open up the webui, it'll be shown at the bottom.)

SkeritOP•3y ago

I am indeed logging it inside the Node.js proxy itself. I can run the command like this on the server host: CODER_AGENT_TOKEN=79916bf2-892a-4f96-b32a-68ede9d5fc6b CODER_AGENT_URL=http://127.0.0.1:3091/ coder agent Do you want the entire log output? (It does indeed mention this version:

2022-09-07 11:19:40.517 [INFO]    <./cli/agent.go:78>    workspaceAgent.func1    starting agent    {"url": "http://127.0.0.1:3091/", "auth": "token", "version": "v0.8.11+cde036c"}

)

maf•3y ago

Do you want the entire log output?

Just want to know if it connected or ran into the same error as above

It does indeed mention this version

That's the agent -- good, does it say so for the server as well (in the webui)

SkeritOP•3y ago

Yes, the webui also says Coder v0.8.11+cde036c It does not give me any "agent token invalid" errors. It does fail to run the startup script, but that's probably expected?

2022-09-07 11:52:07.540 [WARN]    <./agent/agent.go:170>    (*agent).run.func1    agent script failed ...
  "error": run:
               github.com/coder/coder/agent.(*agent).runStartupScript
                   /home/runner/work/coder/coder/agent/agent.go:375
             - exit status 1

2022-09-07 11:52:07.540 [WARN]    <./agent/agent.go:170>    (*agent).run.func1    agent script failed ...
  "error": run:
               github.com/coder/coder/agent.(*agent).runStartupScript
                   /home/runner/work/coder/coder/agent/agent.go:375
             - exit status 1

maf•3y ago

Yes, definitely expected. Ok. So I'd say this confirms that the agent token works, but something goes wrong in-transit between the failing agent and the server.

SkeritOP•3y ago

Very strange, but what could it be? I also tested the command with the public address (so it goes through the proxy) and that also just works. I've also had it happen 1 or 2 times that it does not complain about an invalid token, but then it says the build is outdated. (When it's not)

maf•3y ago

Maybe it could be a fluke related to some workspace start/stop or update actions? I.e. a container that stayed behind. But I agree that's very strange. Would it be possible for you to try to mirror the current setup as much as possible, but removing the proxy from the equation?

SkeritOP•3y ago

Internally that would be pretty easy, I just have to change the access-url to the internal ip address & port right?

maf•3y ago

Yeah, so essentially you'd be updating your coder server configuration so that workspaces are given the local ip/port instead of the proxy. Then you'd try to create one.

SkeritOP•3y ago

Same result 😕

=
+ command -v curl
+ curl -fsSL --compressed http://192.168.50.2:3091/bin/coder-linux-amd64 -o coder
+ break
+ chmod +x coder
+ export CODER_AGENT_AUTH=token
+ CODER_AGENT_AUTH=token
+ export CODER_AGENT_URL=http://192.168.50.2:3091/
+ CODER_AGENT_URL=http://192.168.50.2:3091/
+ exec ./coder agent
2022-09-07 13:49:05.344 [INFO]    <./cli/agent.go:63>    workspaceAgent.func1    spawning reaper process
2022-09-07 13:49:05.409 [INFO]    <./cli/agent.go:78>    workspaceAgent.func1    starting agent    {"url": "http://192.168.50.2:3091/", "auth": "token", "version": "v0.8.11+cde036c"}
2022-09-07 13:49:05.410 [INFO]    <./agent/agent.go:450>    (*agent).init    generating host key
2022-09-07 13:49:05.692 [INFO]    <./agent/agent.go:141>    (*agent).run    connecting
2022-09-07 13:49:05.693 [WARN]    <./agent/agent.go:150>    (*agent).run    failed to dial    {"error": "GET http://192.168.50.2:3091/api/v2/workspaceagents/me/metadata: unexpected status code 401: Agent token is invalid.: Try logging in using 'coder login \u003curl\u003e'."}

=
+ command -v curl
+ curl -fsSL --compressed http://192.168.50.2:3091/bin/coder-linux-amd64 -o coder
+ break
+ chmod +x coder
+ export CODER_AGENT_AUTH=token
+ CODER_AGENT_AUTH=token
+ export CODER_AGENT_URL=http://192.168.50.2:3091/
+ CODER_AGENT_URL=http://192.168.50.2:3091/
+ exec ./coder agent
2022-09-07 13:49:05.344 [INFO]    <./cli/agent.go:63>    workspaceAgent.func1    spawning reaper process
2022-09-07 13:49:05.409 [INFO]    <./cli/agent.go:78>    workspaceAgent.func1    starting agent    {"url": "http://192.168.50.2:3091/", "auth": "token", "version": "v0.8.11+cde036c"}
2022-09-07 13:49:05.410 [INFO]    <./agent/agent.go:450>    (*agent).init    generating host key
2022-09-07 13:49:05.692 [INFO]    <./agent/agent.go:141>    (*agent).run    connecting
2022-09-07 13:49:05.693 [WARN]    <./agent/agent.go:150>    (*agent).run    failed to dial    {"error": "GET http://192.168.50.2:3091/api/v2/workspaceagents/me/metadata: unexpected status code 401: Agent token is invalid.: Try logging in using 'coder login \u003curl\u003e'."}

SkeritOP•3y ago

The auth token configured as the environment variable is 1ef6cb94-2bd2-4b20-99cf-113722f99a4c currently And I did the query again too:

maf•3y ago

re: The discussion in you other thread. There wouldn't happen to be a load-balancer/multiple DBs involved? I.e. everything talks straight to the single docker container running postgres? (Just a thought since you mentioned the occasional build outdated issue.) Have you tried starting with a clean database? If you decide to try that, please take a backup/dump of the current one because if that helps, the issue could be in the db state which we'd want to analyze.

SkeritOP•3y ago

It's just a single docker container, no load balancing. And I've cleared the database a few times now, but I can try that again too.

maf•3y ago

Ok, then I'd say you don't need to. I'd expect once to be enough. Perhaps we should take a step back, since it seems like the problem is with the container running the agent (since it works in other scenarios). Would you mind sharing the full output of docker inspect coder-my-worksace? You can PM it to me if it contains anything sensitive or you can censor it manually.

SkeritOP•3y ago

No problem

message.txt

maf•3y ago

Could you do one more confirm that this works when not run inside the container?

CODER_AGENT_TOKEN=1ef6cb94-2bd2-4b20-99cf-113722f99a4c CODER_AGENT_URL=http://192.168.50.2:3091/ coder agent

CODER_AGENT_TOKEN=1ef6cb94-2bd2-4b20-99cf-113722f99a4c CODER_AGENT_URL=http://192.168.50.2:3091/ coder agent

SkeritOP•3y ago

Yes, that works. Don't get the invalid token error.

maf•3y ago

Thanks for testing that. This issue has me mystified. I guess we've somewhat narrowed down the problem to the actual container. But I see nothing in its configuration that would cause a problem. It's nearly identical to what it'd look like for me. Is the container running on the same machine as the host (i.e. 192.168.50.2)? Or, are the non-container agent tests being performed on the same machine running the problematic containers?

SkeritOP•3y ago

Yes, everything is on the same machine. I actually haven't really worked with docker much before, it's the first time I've worked so much with it 🙂 I did have Podman installed and tested that out, but I've removed it before I even tried coder.

maf•3y ago

Hehe, seems this has become a trial by fire for you then 😅 Ok, I'd like for you to try one ugly workaround that just might "fix" the issue for you. Make the following change to you main.tf:

  entrypoint = ["sh", "-c", join("\n", ["export CODER_AGENT_TOKEN=${coder_agent.main.token}", replace(coder_agent.main.init_script, "127.0.0.1", "host.docker.internal")])]

  entrypoint = ["sh", "-c", join("\n", ["export CODER_AGENT_TOKEN=${coder_agent.main.token}", replace(coder_agent.main.init_script, "127.0.0.1", "host.docker.internal")])]

(I.e. just replace the current entrypoint with that.) That will add an explicit export for the token to the bootstrap script, my hunch is that env values are not being propagated into your entrypoint and as such, CODER_AGENT_TOKEN is left empty. Does the following print world? (Feel free to use any docker image, just picked alpine out of habit.)

docker run --rm --env HELLO=world --entrypoint /bin/sh alpine -c 'echo $HELLO'

docker run --rm --env HELLO=world --entrypoint /bin/sh alpine -c 'echo $HELLO'

SkeritOP•3y ago

That modified entrypoint also didn't fix it! 😕 I downloaded the coder binary code-server was serving (wget http://192.168.50.2:3091/bin/coder-linux-amd64) and did the test in the console again (

CODER_AGENT_TOKEN=599fc1ee-6809-4064-a9ba-7bca70e1e602  CODER_AGENT_URL=https://coder.kumulus.11ways.be/ ./coder-linux-amd64 agent

) and that again did not have any problems. And yes, that test with the alpine image did echo world Could it be related to my user issue and somehow get the wrong env variables or something?

maf•3y ago

I won't way it's impossible, we don't really understand what's going on there either.

SkeritOP•3y ago

I see my docker is using the btrfs method...

maf•3y ago

Do you think that could be the source of the issue? I personally don't see how it would affect it though. Could you try the plain docker template, btw? Want to see if you have this same problem with other templates too. (No need to make any changes to it.)

SkeritOP•3y ago

Not sure, unless it's a snapshot issue. Funny thing is that this is all a default docker install (except for the dns setting) Looking at the documentation, I thought using the btrfs storage method required some manual changes. Sure, hold on

maf•3y ago

re: manual changes. Docker tries to detect the underlying filesystem and defaults to using that driver. So for instance if you have root on ZFS, it'd use the zfs driver. Probably the same with btrfs. You can define it in /etc/docker/daemon.json though, e.g. {"storage-driver": "overlay2"} and then restart the docker daemon (might need to wipe stuff from /var/lib/docker though.)

SkeritOP•3y ago

Well well well... the plain docker one works. (Didn't even have to add the user="coder:coder" bit)

maf•3y ago

Wow, that's even more confusing. 😅 Would you mind trying plain vanilla docker-code-server template again? And if you need the fix again, instead of user="coder:coder", add HOME=/home/coder to env.

SkeritOP•3y ago

Sure First test failed, with the agent token error and the chdir/wrong HOME thing. Adding the HOME env var now. Adding the HOME env didn't fix it either. Huh. The /etc/passwd file in the container is the same as the one of the host server. It's a copy. Hmm, everything's the same. That container's root is basically the same as the host's root. Found someone else that had a similar issue... 6 years ago: https://github.com/moby/moby/issues/10216

maf•3y ago

Oh wow, that's crazy (nice find!) So I guess you could try this workaround https://github.com/moby/moby/issues/10216#issuecomment-196743892 or changing the storage driver to overlay2 👆

SkeritOP•3y ago

I might post a little comment on there. Maybe it has something todo with a snapshot I made, restored & then rolled back again of the root drive.

maf•3y ago

I made an update on the ticket, does my conclusion seem accurate? https://github.com/coder/coder/issues/3870#issuecomment-1240444951 Ultimately it looks like both of your issues had the same root cause. Pretty bad bug in Docker or btrfs 😬

SkeritOP•3y ago

Perfect. I was also wondering why (even though the root partition was all wrong) the Agent token is invalid thing kept happening... Guess we'll never really know ^^ Fyi: I deleted the image itself and recreated the template & workspace and now docker-code-server does work!

Phorcys•3y ago

what a weird issue

maf•3y ago

👍. Yeah, it bugs me a little that we never got to fully understand the agent token issue, but who knows what was wrong and how much of the host environment was copied over to the container. It seemed like Docker was lying to us as well so I'm quite ready to shovel this into the "just btrfs things" pile 😅. Btw, thanks for putting up with all the testing @Skerit, I'm happy we got some answers after all that effort!

SkeritOP•3y ago

Thank you for guiding me through it! Many other projects would have given up and told me I was on my own 😄

Gaming

Programming

Agent token is invalid inside of workspace container

Did you find this page helpful?