"The agent cannot authenticate until the workspace provision job has been completed"

Trying to create a template for use with XCP-ng/XOA, I am provisioning the VM with cloud-init. At first start, it works perfectly but when shutting down and restarting the workspace through Coder (and not while directly shutting down the VM) the coder-agent can't authenticate. full source: https://github.com/Millefeuille42/coder-xcp-xoa-template#
GitHub
GitHub - Millefeuille42/coder-xcp-xoa-template
Contribute to Millefeuille42/coder-xcp-xoa-template development by creating an account on GitHub.
9 Replies
Codercord
Codercord2mo ago
<#1352739141951033366>
Category
Help needed
Product
Coder (v2)
Platform
Linux
Logs
Please post any relevant logs/error messages.
Millefeuille
MillefeuilleOP2mo ago
truncated main.tf
terraform {
required_providers {
coder = {
source = "coder/coder"
}
xenorchestra = {
source = "vatesfr/xenorchestra"
}
}
}

[...]

provider "coder" {}

data "coder_workspace" "me" {}
data "coder_workspace_owner" "me" {}

resource "coder_agent" "dev" {
arch = "amd64"
os = "linux"
}

[...]

resource "xenorchestra_vm" "coder_vm" {

[...]

cloud_config = templatefile("cloud-config.yaml.tftpl", {
username = local.username
hostname = local.hostname
ssh_key = tls_private_key.rsa_4096.public_key_openssh
coder_agent_token = coder_agent.dev.token
code_server_setup = var.code_server
init_script = base64encode(coder_agent.dev.init_script)
extra_packages = var.extra_packages
tpl_setup_script = base64encode(var.user_setup_script)
user_setup_script = base64encode(data.coder_parameter.user_setup_script.value)
user_packages = jsondecode(data.coder_parameter.extra_packages.value)
})
}
terraform {
required_providers {
coder = {
source = "coder/coder"
}
xenorchestra = {
source = "vatesfr/xenorchestra"
}
}
}

[...]

provider "coder" {}

data "coder_workspace" "me" {}
data "coder_workspace_owner" "me" {}

resource "coder_agent" "dev" {
arch = "amd64"
os = "linux"
}

[...]

resource "xenorchestra_vm" "coder_vm" {

[...]

cloud_config = templatefile("cloud-config.yaml.tftpl", {
username = local.username
hostname = local.hostname
ssh_key = tls_private_key.rsa_4096.public_key_openssh
coder_agent_token = coder_agent.dev.token
code_server_setup = var.code_server
init_script = base64encode(coder_agent.dev.init_script)
extra_packages = var.extra_packages
tpl_setup_script = base64encode(var.user_setup_script)
user_setup_script = base64encode(data.coder_parameter.user_setup_script.value)
user_packages = jsondecode(data.coder_parameter.extra_packages.value)
})
}
cloud init file
#cloud-config
users:
- name: ${username}
sudo: ["ALL=(ALL) NOPASSWD:ALL"]
groups: sudo
shell: /bin/bash
ssh_authorized_keys:
- ${ssh_key}
packages:
- git
- curl
- jq

[...]

write_files:
- path: /opt/coder/init
permissions: "0755"
encoding: b64
content: ${init_script}
- path: /etc/systemd/system/coder-agent.service
permissions: "0644"
content: |
[Unit]
Description=Coder Agent
After=network-online.target
Wants=network-online.target

[Service]
User=${username}
ExecStart=/opt/coder/init
Environment=CODER_AGENT_TOKEN=${coder_agent_token}
Restart=always
RestartSec=10
TimeoutStopSec=90
KillMode=process

OOMScoreAdjust=-900
SyslogIdentifier=coder-agent

[Install]
WantedBy=multi-user.target
runcmd:
- hostnamectl set-hostname ${hostname}
- chown ${username}:${username} /home/${username}
- systemctl enable coder-agent
- systemctl start coder-agent
final_message: "XCP-ng workspace setup complete! SSH: ${username}@${hostname}"
#cloud-config
users:
- name: ${username}
sudo: ["ALL=(ALL) NOPASSWD:ALL"]
groups: sudo
shell: /bin/bash
ssh_authorized_keys:
- ${ssh_key}
packages:
- git
- curl
- jq

[...]

write_files:
- path: /opt/coder/init
permissions: "0755"
encoding: b64
content: ${init_script}
- path: /etc/systemd/system/coder-agent.service
permissions: "0644"
content: |
[Unit]
Description=Coder Agent
After=network-online.target
Wants=network-online.target

[Service]
User=${username}
ExecStart=/opt/coder/init
Environment=CODER_AGENT_TOKEN=${coder_agent_token}
Restart=always
RestartSec=10
TimeoutStopSec=90
KillMode=process

OOMScoreAdjust=-900
SyslogIdentifier=coder-agent

[Install]
WantedBy=multi-user.target
runcmd:
- hostnamectl set-hostname ${hostname}
- chown ${username}:${username} /home/${username}
- systemctl enable coder-agent
- systemctl start coder-agent
final_message: "XCP-ng workspace setup complete! SSH: ${username}@${hostname}"
The token is not refreshed upon restarts which seems to be the issue, however I don't see what I do differently from other templates that could cause this issues :/ The issue is that the root disk's lifetime is permanent, meaning that it wont be re-init upon restart. The XOA provider doesn't permit "volatile" disks at the time meaning that I have to find a solution for this.
Atif
Atif2mo ago
You need a way to reinject the new token on each start. It looks like the cloud init doesn't run on subsequent restarts So the agent never gets the new token
Phorcys
Phorcys2mo ago
@Millefeuille ^ usually you'd want to destroy the VM to avoid this issue or cloud-init is usually ran at every boot by most cloud providers, but it would seem that it's not the case here
Millefeuille
MillefeuilleOP2mo ago
Thanks for the feedback! I'm working on a disk provider for XOA so I can have an ephemeral disk for root and a persistent one for home, currently the only solution is to delete the whole vm upon restarts, meaning the home gets wiped out. Once I get the disk provider to work, the issue would be solved I might have to tweak some parameters, but I'm learning Terraform and cloud-init at the same time. I'll take some time with the DevOps/IT guys to know if there is a solution that wouldn't require some code X) Idk if it's ok for you, but I'd suggest to mark the issue as closed once I get it to work with XOA ?
Phorcys
Phorcys2mo ago
hmm, yeah usually you would be able to keep the disk but not the template, this is likely an XOA thing you should take a look at how we do it with other cloud providers, maybe it could help
Millefeuille
MillefeuilleOP2mo ago
Will take a deeper look, thanks for the info !
Phorcys
Phorcys2mo ago
also, on another note, we'll be at KubeCon EU next week so feel free to drop by our booth if you're there!
Idk if it's ok for you, but I'd suggest to mark the issue as closed once I get it to work with XOA ?
definitely, also please share the solution once you find it, it'll help other users down the line :-)
Millefeuille
MillefeuilleOP2mo ago
Won't be there this year unfortunately but I'd be happy to drop by next time / at another con !

Did you find this page helpful?