Prevent destroying and recreating home volume on workspace update

Using this example template: https://github.com/coder/coder/tree/main/examples/templates/do-linux When I make a change to cloud-config.tftpl, the volume that houses my user's home directory gets destroyed and recreated each time I update the workspace to the newest version of my template. How can I prevent this? I'd like to be able to push a new version of the template that, say, installs some additional program, and have the user be able to update their workspace to the new version without blasting whatever the user was working on in their home directory.
47 Replies
kyle•3y ago
You can use the Terraform ignore_changes built-in: https://www.terraform.io/language/meta-arguments/lifecycle#ignore_changes
Mr_Neezer•3y ago
ignore_changes takes an array... what values should I be populating that with? Using the example template linked above... can't figure out what values to put in there. When I push my template update, I see digitalocean_volume.home_volume: Plan to create in the output... if I understood why Terraform plans to create it, I imagine I could use that value in ignore_changes, right? Is there a way to get Coder/Terraform to print that information somehow? Apologies if that's a stupid question; still very new to Terraform.
kyle•3y ago
All good! ignore_changes = all will probably do it
Mr_Neezer•3y ago
Do I need to pair that with prevent_destroy = true, or is ignore_changes = all good enough on its own?
kyle•3y ago
ignore_changes = all should be good enough on its own
Mr_Neezer•3y ago
Ok, trying to push a new template with that change... will report back in a few minutes. Hmm, no joy. I added that inside the resource block for digitalocean_volume" "home_volume"... is that the right location? The log output when pushing the template still says digitalocean_volume.home_volume: Plan to create and the volume is destroyed/created when updating the workspace. This is what it looks like right now:
resource "digitalocean_volume" "home_volume" {
region = var.region
name = "coder-${data.coder_workspace.me.owner}-${data.coder_workspace.me.name}-home"
size = var.home_volume_size
initial_filesystem_type = "ext4"
initial_filesystem_label = "coder-home"

lifecycle {
ignore_changes = all
resource "digitalocean_volume" "home_volume" {
region = var.region
name = "coder-${data.coder_workspace.me.owner}-${data.coder_workspace.me.name}-home"
size = var.home_volume_size
initial_filesystem_type = "ext4"
initial_filesystem_label = "coder-home"

lifecycle {
ignore_changes = all
... or should I be putting that in "digitalocean_project_resources" "project" instead? Just now noticing that digitalocean_volume.home_volume.urn is listed as a resource dependency there...?
kyle•3y ago
It's weird that's recreated... I guess try that for the project resources hmm
Mr_Neezer•3y ago
@kyle Hey, circling back to this... I tried relocating the lifecycle call to project resources, and no difference. Would it be helpful to post my main.tf in a gist in its entirety so you can see the whole thing? I'd really like to get this figured out.
kyle•3y ago
Yup, I'm happy to help!
Mr_Neezer•3y ago
@kyle Thanks! Here's the gist for main.tf: https://gist.github.com/neezer/82a0b4b74bce54d73b24b7d6c2ad71d2
kyle•3y ago
Could you share the Terraform output as well?
Mr_Neezer•3y ago
Sure, one sec Gist updated to include Terraform output. @kyle ^^
kyle•3y ago
Ahh ty So on template push you'd expect a home volume to be created, right? Could you send me the output from a workspace create and stop/start please?
Mr_Neezer•3y ago
So on template push you'd expect a home volume to be created, right?
No, I don't think so. I think I'd expect a new volume to be created only on workspace create.
Could you send me the output from a workspace create and stop/start please?
Yeah, I'll add it to the gist in a bit. I'll ping ya here when it's up.
kyle•3y ago
So resources are never created on template push, they are just planned.
Mr_Neezer•3y ago
That's my understanding, yes.
kyle•3y ago
@Mr_Neezer following up on this, in the gist it doesn't show the home volume being deleted on stop.
Mr_Neezer•3y ago
@kyle Thanks for the follow-up! Sorry, been busy trying to figure out this automatic CSR issue... I've been testing whether or not data persists by creating a text file with some random string in my home directory, then start/stopping the instance. When I log back in using the Terminal and/or SSH, the file is gone. I still owe you logs for start and create; once I get my recent CSR changes sorted, I'll update the gist and ping you with those too. @kyle Ok, gist is updated with command output from create, stop, and start. main.tf has also been updated to show my latest setup. Additionally, the mounts section in my cloud-init looks like this:
- [
- [
My primary test is this: 1. SSH into the workspace. Run echo "testing persistence..." > ~/test.txt 2. Stop the workspace. 3. Start the workspace. 4. cat ~/test.txt -> cat: test.txt: No such file or directory Additionally, it appears all my initial provision scripts are re-run each time I stop/start as well. Any ideas what I'm doing wrong here? @kyle I see the following in the stop/start logs:
digitalocean_volume.home_volume: Drift detected (update)
Could that be related? What does Terraform consider "drift?"
kyle•3y ago
So it's not recreating it based on the output. Are you certain it's being mounted correctly?
Mr_Neezer•3y ago
I didn't change anything from the coder/coder example template for do-linux as far as mounting goes (at least, not to my knowledge). Is there a way I can diagnose whether the mount happened successfully or not?
kyle•3y ago
Try typing mount inside a workspace to see. And toss the output in here
kyle•3y ago
Seems like the home volume isn't being mounted for some reason. Try catting the logs in /var/log/cloud-init.log
Mr_Neezer•3y ago
Yeah, looks like it. One sec, catting the clout init logs...
Mr_Neezer•3y ago
Stderr: mount: /home/evan: wrong fs type, bad option, bad superblock on /dev/sda, missing codepage or helper program, or other error.
kyle•3y ago
Welp, interesting
Mr_Neezer•3y ago
I pasted the mounts section of cloud-init further up this thread, if you missed that. Didn't change that from the do-linux example repo, though.
kyle•3y ago
Hmm, it's not impossible that ours is outdated.
Mr_Neezer•3y ago
True. I opened up the volume in my DO dashboard, and see this for config instructions:
# Create a mount point for your volume:
$ mkdir -p /mnt/coder_evan_dev_home

# Mount your volume at the newly-created mount point:
$ mount -o discard,defaults,noatime /dev/disk/by-id/scsi-0DO_Volume_coder-evan-dev-home /mnt/coder_evan_dev_home

# Change fstab so the volume will be mounted after a reboot
$ echo '/dev/disk/by-id/scsi-0DO_Volume_coder-evan-dev-home /mnt/coder_evan_dev_home ext4 defaults,nofail,discard 0 0' | sudo tee -a /etc/fstab
# Create a mount point for your volume:
$ mkdir -p /mnt/coder_evan_dev_home

# Mount your volume at the newly-created mount point:
$ mount -o discard,defaults,noatime /dev/disk/by-id/scsi-0DO_Volume_coder-evan-dev-home /mnt/coder_evan_dev_home

# Change fstab so the volume will be mounted after a reboot
$ echo '/dev/disk/by-id/scsi-0DO_Volume_coder-evan-dev-home /mnt/coder_evan_dev_home ext4 defaults,nofail,discard 0 0' | sudo tee -a /etc/fstab
Do those options in fstab jibe with what I'm doing in the mounts section of cloud-init?
kyle•3y ago
I think so... but maybe instead of auto lets explicitly put ext4?
Mr_Neezer•3y ago
Ok, giving that a try.
kyle•3y ago
You can try running mount -o defaults,uid=1000,gid=1000 /dev/disk/by-id/scsi-0DO_Volume_coder-evan-dev-home /home/evan to do it manually But that should essentially be the same as the clout init
Mr_Neezer•3y ago
Interesting. That didn't work, but the error in cloud-init logs changed:
Stderr: mount: /home/evan: can't find LABEL=coder-home.
That might be related to a change I tried last night, which was to try to use the filesystem_label attribute instead of the initial_filesystem_label attribute. Lemme put that back and try again. While I'm waiting for that to rebuild, an adjacent question: for cloud-init, am I guaranteed to have the mounts resolved by the time I reach the writefiles and/or runcmd sections? Or is the order not guaranteed?
kyle•3y ago
I believe so, but I'm not certain we'd have to double check.
Mr_Neezer•3y ago
Assuming the mounts are working as expected, of course.
kyle•3y ago
kyle•3y ago
Seems to be accurate
Mr_Neezer•3y ago
Ok, rebuilt using initial_filesystem_label (as y'all are doing in the do-linux example), and I'm still getting this error:
Stderr: mount: /home/evan: can't find LABEL=coder-home.
Thoughts? Seems to be the value of digitalocean_volume.home_volume.initial_filesystem_label Hmm, maybe this is related? I tried running doctl projects resources list MY_DO_PROJECT_ID, and I see my domain (DNS), my kubernetes cluster, my droplet, and my hosted database... but no volumes. Maybe the volume isn't being correctly assigned to the DO project? Would that cause issues? The DO dashboard does show the volume as being assigned to the droplet Coder created, though. I tried merging the config docs from above with my mounts block, changing LABEL=${home_volume_label} to /dev/disk/by-id/scsi-0DO_Volume_${home_volume_name} (and making the supporting changes in main.tf), and now I'm back to the previous error:
Stderr: mount: /home/evan: wrong fs type, bad option, bad superblock on /dev/sda, missing codepage or helper program, or other error.
That seems like progress, since the last error was effectively "I can't find the volume!" and now it's like "This volume you gave me looks whack!" -- at least AFAICT I'm going to try your earlier suggestion: https://discord.com/channels/747933592273027093/1028039157618069524/1030150002489688144
Mr_Neezer•3y ago
My droplet is Debian 11, not Ubuntu, but possible its a related issue. @kyle Performing the mount manually in runcmd worked:
- mount -o discard,defaults,noatime /dev/disk/by-id/scsi-0DO_Volume_${home_volume_name} /home/${username}
- echo '/dev/disk/by-id/scsi-0DO_Volume_${home_volume_name} /home/${username} ext4 defaults,uid=1000,gid=1000,nofail,discard 0 0' | tee -a /etc/fstab
- mount -o discard,defaults,noatime /dev/disk/by-id/scsi-0DO_Volume_${home_volume_name} /home/${username}
- echo '/dev/disk/by-id/scsi-0DO_Volume_${home_volume_name} /home/${username} ext4 defaults,uid=1000,gid=1000,nofail,discard 0 0' | tee -a /etc/fstab
Tested through a full stop/start cycle, and no errors in cloud-init log, I see the mount in mount output, and files I save to my home directory now persist. Not sure why the mounts module wasn't working as expected; this is the second cloud-init module that doesn't seem to work as documented (I also had issues with the ansible module). Not leaving a favorable impression of cloud-init, but I'm happy the issue is resolved. Thanks again for all your help!
kyle•3y ago
Absolutely wonderful!
Mr_Neezer•3y ago
@kyle Weeeellll, may have spoken a bit too soon. The above does work as I described, but I'm noticing now that when I stop my workspace, my old droplet is destroyed (good) but a new droplet is created (wha??) From the very end of the logs for a "stop":
Apply complete! Resources: 4 added, 1 changed, 4 destroyed.
Looks like it basically just cycled the workspace instead of actually stopping it. Ahh, ok. I had removed all the references to count... turns out those are needed, otherwise the droplet is still listed in the project resource list. Adding that back in and the droplet doesn't get re-created when stopping. Still not 💯 sure I understand what count is though; could you explain?
kyle•3y ago
count determines whether the resource is alive when we perform a stop. It's kinda weird, because start/stop is a Coder-specific primitive, not something in Terraform. Essentially, we provide a helper of data.coder_workspace.<name>.start_count to apply on resources that will only be 1 when the transition is start. This is to allow some resources to destroy on stop for cost savings.
Mr_Neezer•3y ago
Thanks for the clarification, @kyle . I've a related question: I'm trying to configure a db firewall resource (https://registry.terraform.io/providers/digitalocean/digitalocean/latest/docs/resources/database_firewall) that uses the ID of the droplet created:
rule {
type = "droplet"
value = digitalocean_droplet.workspace[0].id
rule {
type = "droplet"
value = digitalocean_droplet.workspace[0].id
This consistently gives me this error on template push:
Error: Invalid index The given key does not identify an element in this collection value: the collection has no elements.
I've tried adding depends_on = [digitalocean_droplet.workspace] but that doesn't seem to make a difference. This feels related to the count logic from above... is it?
kyle•3y ago
It is! You'll need to add the count property to the database_firewall resource as well.
Mr_Neezer•3y ago
Do I need to change the value in my snippet above, or is just adding the count property sufficient? Nm, I don't. Things look like they're working as-expected now. Thanks @kyle !
