AWS EC2 (Devcontainer) fails with custom repo

Replacing the starting repo withhttps://github.com/KyleDavisDev-Preply/build-tools/commit/0ada6f0b3c28ad46fb570d3394d551eda34641b8 causes the coder agent to fail. The logs freeze at step 2 taking a snapshot... then everything becomes unresponsive till it becomes unhealthy.
18 Replies
Codercord
Codercord•4mo ago
<#1286135473429020792>
Category
Help needed
Product
Coder OSS (v2)
Platform
Linux
Logs
Please post any relevant logs/error messages.
Cian
Cian•4mo ago
What version of Coder are you using here? Can you also provide logs from the workspace container when the workspace is in this frozen/unhealthy state?
JustATempest
JustATempestOP•4mo ago
The devlogs above are the logs that I get. I'm using the latest docker compose version of coder
Phorcys
Phorcys•4mo ago
@JustATempest are you passing https://github.com/KyleDavisDev-Preply/build-tools/commit/0ada6f0b3c28ad46fb570d3394d551eda34641b8 as an argument to the template? or just https://github.com/KyleDavisDev-Preply/build-tools? please check the /health page to get the version
Phorcys
Phorcys•4mo ago
No description
JustATempest
JustATempestOP•4mo ago
This Sure ... I've been fighting for 4 hours today... I'll be taking a break today. Send it some time tomorrow.... 🥱
Phorcys
Phorcys•4mo ago
i'll also do some testing on my own to see if i can reproduce the issue, likely tomorrow :-)
JustATempest
JustATempestOP•4mo ago
7e7fbc39-bbd9-467b-ba17-53f9a8b74d53 https://github.com/KyleDavisDev-Preply/build-tools/ref/head/c_or_cpp this is the repo url for now it's minimal and failing
JustATempest
JustATempestOP•4mo ago
Templates I'm using for reproducing the issue.
Phorcys
Phorcys•4mo ago
I've done some testing and the following repo urls work on my end with another template: - https://github.com/KyleDavisDev-Preply/build-tools - https://github.com/KyleDavisDev-Preply/build-tools#c_or_cpp i think this likely has something to do with the template but I don't really know what envbuilder is supposed to be doing on the "taking a snapshot" step, do you have any ideas on how to troubleshoot @Cian?
Cian
Cian•4mo ago
At the 'taking a snapshot' it's basically hashing all of the files and based on the template you shared above, pushing the images to the remote cache. Do you have container logs from this stage? Meanwhile, I'm trying out your template on my own deployment.
JustATempest
JustATempestOP•4mo ago
That's the problem. It freezes on taking the snapshot step. Then I lose all connection on coder. Logging stops. I'm going to see if there's a flag I can add to pipe the logs from ENV builder into a file on the EC2 host. I run coder using docker compose. I'll send those logs as well as the logs from the container running on the EC2 instance. As soon as I gather them up.
Phorcys
Phorcys•4mo ago
is there maybe a flag we can set to toggle debug/verbose logs in envbuilder?
Cian
Cian•4mo ago
ENVBUILDER_VERBOSE=true
JustATempest
JustATempestOP•4mo ago
nevermind. Ingore this if you did not see the deleted message.... ok so I think this is what we are looking for
[ 315.136314] cloud-init[1235]: done.
[ 315.136440] cloud-init[1235]: #2: Taking snapshot of files...
[ 315.136520] cloud-init[1235]: 2024-09-23 19:24:14,928 - cc_scripts_user.py[WARNING]: Failed to run module scripts-user (scripts in /var/lib/cloud/instance/scripts)
[ 315.136580] cloud-init[1235]: 2024-09-23 19:24:14,928 - util.py[WARNING]: Running module scripts-user (<module 'cloudinit.config.cc_scripts_user' from '/usr/lib/python3/dist-packages/cloudinit/config/cc_scripts_user.py'>) failed
[ 315.136314] cloud-init[1235]: done.
[ 315.136440] cloud-init[1235]: #2: Taking snapshot of files...
[ 315.136520] cloud-init[1235]: 2024-09-23 19:24:14,928 - cc_scripts_user.py[WARNING]: Failed to run module scripts-user (scripts in /var/lib/cloud/instance/scripts)
[ 315.136580] cloud-init[1235]: 2024-09-23 19:24:14,928 - util.py[WARNING]: Running module scripts-user (<module 'cloudinit.config.cc_scripts_user' from '/usr/lib/python3/dist-packages/cloudinit/config/cc_scripts_user.py'>) failed
Cian
Cian•4mo ago
Interesting... there may be some more detailed info in the cloud_init logs on the instance, but that's going to be annoying to get to without an SSH key in the instance. FYI @JustATempest I updated the aws-devcontainer template: https://github.com/coder/coder/tree/4be5b2f/examples/templates/aws-devcontainer I haven't been able to reproduce the hang you have been describing but I did add the facility to insert an SSH key into the VM so you can at least log in and poke around if things don't connect. My experience with "taking a snapshot" is that it can sometimes take a while to complete though.
JustATempest
JustATempestOP•4mo ago
I'll see about writing up steps to reproduce

Did you find this page helpful?