Intermittent failures in container deletion/creation on remote docker provider

Original message: https://discord.com/channels/747933592273027093/971231372373033030/1323039264778223657 Other relevant info (while I don't have logs handy): provider config:
provider "docker" {
host = "ssh://user@host"
ssh_opts = ["-o", "StrictHostKeyChecking=no", "-o", "UserKnownHostsFile=/dev/null", "-i", "/path/to/key"]
}
provider "docker" {
host = "ssh://user@host"
ssh_opts = ["-o", "StrictHostKeyChecking=no", "-o", "UserKnownHostsFile=/dev/null", "-i", "/path/to/key"]
}
and just in case this is relevant: all my affected docker_containers depend on a null_resource that ssh's into the remote host and instantiates some local seed data to a persistent volume if this is the first time this workspace has ever been started (as far as I can tell this part never fails):
resource "null_resource" "prepopulate_db_data" {
count = data.coder_workspace.me.start_count
provisioner "remote-exec" {
connection {
type = "ssh"
user = "user"
private_key = file("/path/to/key")
host = "host"
agent = false
timeout = "90s"
}

inline = [
"docker run --rm -v /root/dev_data:/dev_data -v ${docker_volume.db_data.name}:/data alpine sh -c 'if [ ! -f /data/.initialized ]; then cp -r /dev_data/db_data/* /data/; touch /data/.initialized; fi'"
]
}

depends_on = [docker_volume.db_data]
}
resource "null_resource" "prepopulate_db_data" {
count = data.coder_workspace.me.start_count
provisioner "remote-exec" {
connection {
type = "ssh"
user = "user"
private_key = file("/path/to/key")
host = "host"
agent = false
timeout = "90s"
}

inline = [
"docker run --rm -v /root/dev_data:/dev_data -v ${docker_volume.db_data.name}:/data alpine sh -c 'if [ ! -f /data/.initialized ]; then cp -r /dev_data/db_data/* /data/; touch /data/.initialized; fi'"
]
}

depends_on = [docker_volume.db_data]
}
7 Replies
Codercord
Codercord4d ago
<#1323041786561826908>
Category
Help needed
Product
Coder (v2)
Platform
Linux
Logs
Please post any relevant logs/error messages.
Ryan
RyanOP4d ago
It seems like this periodically happens whenever any kind of docker API call is made:
Error: Unable to create container: error during connect: Post "http://docker.example.com/v1.41/containers/create?name=workspace-name": command [ssh -o StrictHostKeyChecking=no -o UserKnownHostsFile=/dev/null -i /path/to/key -l root -- hostname docker system dial-stdio] has exited with signal: killed, please make sure the URL is valid, and Docker 18.09 or later is installed on the remote host: stderr=Warning: Permanently added 'HOST' (key-type) to the list of known hosts.
on main.tf line 368, in resource "docker_container" "workspace":
368: resource "docker_container" "workspace" {
Error: Unable to create container: error during connect: Post "http://docker.example.com/v1.41/containers/create?name=workspace-name": command [ssh -o StrictHostKeyChecking=no -o UserKnownHostsFile=/dev/null -i /path/to/key -l root -- hostname docker system dial-stdio] has exited with signal: killed, please make sure the URL is valid, and Docker 18.09 or later is installed on the remote host: stderr=Warning: Permanently added 'HOST' (key-type) to the list of known hosts.
on main.tf line 368, in resource "docker_container" "workspace":
368: resource "docker_container" "workspace" {
it always eventually works with a retry. I don't think the network is actually flaky; the two machines are behind the same switch It's weird that Permanently added 'HOST' (key-type) to the list of known hosts. is being written to stderr even though I have StrictHostKeyChecking off (maybe because I have a UserKnownHostsFile set too? Perhaps I'll try with that off and see if anything more useful is in stderr It's also weird that it says docker.example.com as the URL for those API calls lol
Phorcys
Phorcys3d ago
I think you should be able to avoid using remote-exec but i don't think that's where the issue comes from it doesn't seem like a network issue to me, I'm not sure why it'd exit with signal: killed though
Ryan
RyanOP3d ago
That's a good spot, you're right
Phorcys
Phorcys3d ago
GitHub
random signal: killed when building · Issue #17918 · containers/pod...
Issue Description I am randomly getting signal: killed from podman build command ( or via API endpoint ), there is not any host resource pressure, dmesg and journald does not print anything useful ...
Phorcys
Phorcys3d ago
this is for podman but seems to be similar
Ryan
RyanOP3d ago
Hmmm interesting

Did you find this page helpful?