Immich•4mo ago

Storage Template Migration gets stuck occasionally

I've found that my storage template migration will occasionally get stuck and I have the restart the container to get it to resume.

96 Replies

Immich•4mo ago

:wave: Hey @Pixil, Thanks for reaching out to us. Please carefully read this message and follow the recommended actions. This will help us be more effective in our support effort and leave more time for building Immich :immich:. References - Container Logs: docker compose logs docs - Container Status: docker ps -a docs - Reverse Proxy: https://immich.app/docs/administration/reverse-proxy - Code Formatting https://support.discord.com/hc/en-us/articles/210298617-Markdown-Text-101-Chat-Formatting-Bold-Italic-Underline#h_01GY0DAKGXDEHE263BCAYEGFJA

Immich•4mo ago

Checklist I have... 1. :ballot_box_with_check: verified I'm on the latest release(note that mobile app releases may take some time). 2. :ballot_box_with_check: read applicable release notes. 3. :ballot_box_with_check: reviewed the FAQs for known issues. 4. :ballot_box_with_check: reviewed Github for known issues. 5. :ballot_box_with_check: tried accessing Immich via local ip (without a custom reverse proxy). 6. :ballot_box_with_check: uploaded the relevant information (see below). 7. :ballot_box_with_check: tried an incognito window, disabled extensions, cleared mobile app cache, logged out and back in, different browsers, etc. as applicable (an item can be marked as "complete" by reacting with the appropriate number) Information In order to be able to effectively help you, we need you to provide clear information to show what the problem is. The exact details needed vary per case, but here is a list of things to consider: - Your docker-compose.yml and .env files. - Logs from all the containers and their status (see above). - All the troubleshooting steps you've tried so far. - Any recent changes you've made to Immich or your system. - Details about your system (both software/OS and hardware). - Details about your storage (filesystems, type of disks, output of commands like fdisk -l and df -h). - The version of the Immich server, mobile app, and other relevant pieces. - Any other information that you think might be relevant. Please paste files and logs with proper code formatting, and especially avoid blurry screenshots. Without the right information we can't work out what the problem is. Help us help you ;) If this ticket can be closed you can use the /close command, and re-open it later if needed.

GitHub

immich-app immich · Discussions

Explore the GitHub Discussions forum for immich-app immich. Discuss code, ask questions & collaborate with the developer community.

FAQ | Immich

User

GitHub

Issues · immich-app/immich

High performance self-hosted photo and video management solution. - Issues · immich-app/immich

PixilOP•4mo ago

deployment yaml

message.txt

PixilOP•4mo ago

message.txt

PixilOP•4mo ago

apiVersion: postgresql.cnpg.io/v1
kind: Cluster
metadata:
  name: immich-postgresql
  namespace: photos
spec:
  instances: 1
  # https://github.com/tensorchord/cloudnative-pgvecto.rs
  imageName: registry.lab.reisman.org/proxy.ghcr.io/tensorchord/cloudnative-pgvecto.rs:16-v0.3.0
  postgresql:
    shared_preload_libraries:
      - "vectors.so"
  bootstrap:
    initdb:
      database: immich
      owner: immich
      secret:
        name: immich-postgresql-user
      dataChecksums: true
      postInitApplicationSQL:
        - ALTER SYSTEM SET search_path TO "$user", public, vectors;
        - CREATE EXTENSION IF NOT EXISTS cube
        - CREATE EXTENSION IF NOT EXISTS earthdistance
        - CREATE EXTENSION IF NOT EXISTS vectors
        - GRANT USAGE ON SCHEMA vectors TO immich
        - GRANT SELECT ON ALL TABLES IN SCHEMA vectors TO immich
  storage:
    storageClass: nfs-client-nobackup
    size: 1Gi
---
apiVersion: v1
stringData:
  password: ${secrets_apps_photos_immich_postgresqlUserPassword}
  username: immich
kind: Secret
metadata:
  name: immich-postgresql-user
  namespace: photos
type: kubernetes.io/basic-auth

apiVersion: postgresql.cnpg.io/v1
kind: Cluster
metadata:
  name: immich-postgresql
  namespace: photos
spec:
  instances: 1
  # https://github.com/tensorchord/cloudnative-pgvecto.rs
  imageName: registry.lab.reisman.org/proxy.ghcr.io/tensorchord/cloudnative-pgvecto.rs:16-v0.3.0
  postgresql:
    shared_preload_libraries:
      - "vectors.so"
  bootstrap:
    initdb:
      database: immich
      owner: immich
      secret:
        name: immich-postgresql-user
      dataChecksums: true
      postInitApplicationSQL:
        - ALTER SYSTEM SET search_path TO "$user", public, vectors;
        - CREATE EXTENSION IF NOT EXISTS cube
        - CREATE EXTENSION IF NOT EXISTS earthdistance
        - CREATE EXTENSION IF NOT EXISTS vectors
        - GRANT USAGE ON SCHEMA vectors TO immich
        - GRANT SELECT ON ALL TABLES IN SCHEMA vectors TO immich
  storage:
    storageClass: nfs-client-nobackup
    size: 1Gi
---
apiVersion: v1
stringData:
  password: ${secrets_apps_photos_immich_postgresqlUserPassword}
  username: immich
kind: Secret
metadata:
  name: immich-postgresql-user
  namespace: photos
type: kubernetes.io/basic-auth

PixilOP•4mo ago

Here is a section of the logs where it stopped moving files

message.txt

PixilOP•4mo ago

and here is what the GUI looks like, I currently have the number of jobs set to 1

PixilOP•4mo ago

I'm also storing my images on an nfs mount, which could be a cause of intermittent problems otherwise this is a brand new deployment as of last night because I wanted to try switching over to cnpg

Immich•4mo ago

Successfully submitted, a tag has been added to inform contributors. :white_check_mark:

PixilOP•2mo ago

@Daniel I ran into the problem of my storage template getting stuck again. Figured I'd revive this old post I made to keep it more organized. This time it was uploading a single photo it's been stuck for about 2 hours now

Daniel•2mo ago

Not exactly sure why you're pinging me tbh 😅 Are there any logs?

PixilOP•2mo ago

sorry, you had answered me a couple days ago and I have logs this time I'm getting them now, but apparently my echoing of them to a file broke all the escape codes

Daniel•2mo ago

Everyone is all over the place replying to people, then someone else takes over because you're off, ...

PixilOP•2mo ago

Here are a bunch of logs for the last two hours. If you have time to look at it with me that would be great, other wise I can wait for someone else

message.txt

PixilOP•2mo ago

I'm assuming the two POSTS towards the top of the logs are the image being uploaded Then at 2:00:58 there is this part:

[Nest] 7  - 03/17/2025, 2:00:58 PM   DEBUG [Microservices:APIKeyService] Attempting to rename file: upload/upload/df1dc8c3-0aa5-433b-86ca-c7dd959226c0/34/f2/34f292fa-6b1f-4dd1-8073-275e63949569.MOV => upload/library/aaron@reismanorg/2025/2025-03-16/IMG_4321.MOV
[Nest] 7  - 03/17/2025, 2:00:58 PM   DEBUG [Microservices:APIKeyService] Unable to rename file. Falling back to copy, verify and delete

[Nest] 7  - 03/17/2025, 2:00:58 PM   DEBUG [Microservices:APIKeyService] Attempting to rename file: upload/upload/df1dc8c3-0aa5-433b-86ca-c7dd959226c0/34/f2/34f292fa-6b1f-4dd1-8073-275e63949569.MOV => upload/library/aaron@reismanorg/2025/2025-03-16/IMG_4321.MOV
[Nest] 7  - 03/17/2025, 2:00:58 PM   DEBUG [Microservices:APIKeyService] Unable to rename file. Falling back to copy, verify and delete

which looks pretty standard to me

Daniel•2mo ago

You being on the jobs page the whole time is kind of annoying lmao So many requests

PixilOP•2mo ago

yeah, I realized that after the fact

Daniel•2mo ago

This is a good shout That's your issue Why it's DEBUG only, idk lol IMO that should be a warning

PixilOP•2mo ago

I get that no matter what, even when it works

Daniel•2mo ago

Which file system are you on?

PixilOP•2mo ago

It's using NFS backed by a truenas zfs pool

Daniel•2mo ago

Hm ok since it's between mount points it probably cannot make a move but needs to copy instead, so this does make sense The copy should work though You said it does work for some files though?

PixilOP•2mo ago

yeah, it usually works and then sometimes it hangs until I restart the pod Here's a snippet from the 14th

PixilOP•2mo ago

message.txt

PixilOP•2mo ago

It seems like something in that move from 2pm is hanging indefinitely and probably just needs to timeout and try again

Daniel•2mo ago

Is it possible that NFS mount is a bit flaky?

PixilOP•2mo ago

I mean, it's nfs so of course

Daniel•2mo ago

Well yeah, but beyond the usual 😅

PixilOP•2mo ago

but it's been pretty rock solid for everything else

Daniel•2mo ago

Hmk

PixilOP•2mo ago

it is using a hard mount so it will block until the request returns if my understand my nfs correctly

Daniel•2mo ago

So Immich getting stuck on that would make a lot of sense And restarting the pod kills the fs call

PixilOP•2mo ago

If immich is assuming that it will always return

Daniel•2mo ago

We're awaiting the move, yeah We have a bunch of logic around file moves (including an extra table) to make sure moves actually execute

PixilOP•2mo ago

I've gotten somewhat familiar with that table having dealt with this for awhile

Daniel•2mo ago

(since otherwise we'd just lose them which would be bad) That makes sense

PixilOP•2mo ago

hard or soft — Specifies whether the program using a file via an NFS connection should stop and wait (hard) for the server to come back online, if the host serving the exported file system is unavailable, or if it should report an error (soft).

hard or soft — Specifies whether the program using a file via an NFS connection should stop and wait (hard) for the server to come back online, if the host serving the exported file system is unavailable, or if it should report an error (soft).

Daniel•2mo ago

Yeah Honestly this is beyond my expertise, but my best hunch is that move getting stuck and Immich waiting on it forever Why that gets stuck? ¯\_(ツ)_/¯

PixilOP•2mo ago

I don't know why it would just hang though, there wasn't an outage, but I assume it's just general flakiness with nfs

Daniel•2mo ago

On the other hand I know many people who run their library on NFS just fine, idk Let me ask around if people are more experienced with NFS

PixilOP•2mo ago

I don't know what the default mount type is for nfs in linux. I'm using the nfs-subdir-external-provisioner in kubernetes which defaults to a hard mount and in particular I need the hard mount for some other services that are using it

Daniel•2mo ago

I think in general it makes a lot of sense to have that hard mount The issue isn't that you have configured that; the issue is that the move gets stuck 😅 (presumably)

PixilOP•2mo ago

yeah 🙂 if it's any incentive, this is the last thing holding me back from moving fully to Immich and buying a server license

Daniel•2mo ago

It most definitely is not ;) But I asked the team if there are any ideas :) Before that

PixilOP•2mo ago

awesome, thanks for the help with this

Nicholas Flamy•2mo ago

FYI, this has been shared with the support crew and hopefully someone who knows more about this will be able to help.

PixilOP•2mo ago

doing a bit more research, it looks like hanging indefinitely might be the right thing, (not necessarily for immich, but for a hard nfs mount). It appears there should be some kernel level errors being generated on every attempt that I assume are getting eaten somewhere

Daniel•2mo ago

Just writing the same thing again :peepoNotes: Well done @Nicholas Flamy :P

PixilOP•2mo ago

if we could at least surface those kernel errors it would help a lot in troubleshooting

Daniel•2mo ago

Maybe on the remote?

PixilOP•2mo ago

It should be on the client side presumably the request isn't making it through so it would have an error like kernel: nfs: server nfsserver not responding, timed out The only relevant thing I see in the nfsd logs on the server are

Mar 12 22:25:33 truenas kernel: nfsd: too many open connections, consider increasing the number of threads
Mar 12 23:10:40 truenas kernel: nfsd: too many open connections, consider increasing the number of threads

Mar 12 22:25:33 truenas kernel: nfsd: too many open connections, consider increasing the number of threads
Mar 12 23:10:40 truenas kernel: nfsd: too many open connections, consider increasing the number of threads

but that hasn't happened since the 12th Here is the exact mount that is being used in the pod

192.168.1.10:/mnt/main/kubernetes/production/photos-immich-library on /usr/src/app/upload/library type nfs4 (rw,relatime,vers=4.2,rsize=1048576,wsize=1048576,namlen=255,hard,proto=tcp,timeo=600,retrans=2,sec=sys,clientaddr=192.168.1.31,local_lock=none,addr=192.168.1.10)

192.168.1.10:/mnt/main/kubernetes/production/photos-immich-library on /usr/src/app/upload/library type nfs4 (rw,relatime,vers=4.2,rsize=1048576,wsize=1048576,namlen=255,hard,proto=tcp,timeo=600,retrans=2,sec=sys,clientaddr=192.168.1.31,local_lock=none,addr=192.168.1.10)

Checking in on this, anyone on the support team have any thoughts? I kind of assume they'd have shared them already, but I'm being optimistic today.

Alex Tran•2mo ago

We are not sure yet. Have you tried set the concurrency level back to default values?

Daniel•2mo ago

We've already established that this is just an NFS issue that may be fixed with some config options. But apparently we don't have NFS experts 😅

PixilOP•2mo ago

that's unfortunate, any chance we can add more logging around it?

IVPathfinder•2mo ago

I think this is something to do with file permissions on the nfs serverside? I've only just skimmed like 2% of your thread here Here's my config I use on my nfs server for a random client

10.1.X.X(sec=sys,rw,secure,anonuid=99,anongid=100,no_root_squash,subtree_check)

10.1.X.X(sec=sys,rw,secure,anonuid=99,anongid=100,no_root_squash,subtree_check)

I think make sure the file permissions on the server are correct as well chmod -R 99:100 yoloinsecurefolder you really need a rock solid connection. nfs really does like to hang like a *&!@ if the connection drops off. A vlan change, switch unplug, new client causing dhcp server to drop off and take down the whole network. etc Be nice on my pr pls xD I'm trying to get a job in the future

Daniel•2mo ago

I'm not sure why that's in reply to this message :D

IVPathfinder•2mo ago

you must be overwhelmed at times with all the chatter. the bot on github marked you as the reviewer lol. I start to get ~~german~~ russian when busy.

Nicholas Flamy•2mo ago

You trilingual?

PixilOP•2mo ago

I don't think it's a permissions issue. My setup works for everything else in the k8s cluster and in fact works ~99% of the time for Immich. Occasionally, the storage template job tries to copy a file and seems to hang indefinitely waiting for the copy to return.

IVPathfinder•2mo ago

I wish 😂 Network? I'd love to know a surefire way you could confirm/deny this one. I had a similar nfs hang problem in the past so decided eat the performance loss and use cifs smb instead. Immich does have a plethora of files? Is this mount different from any of your other rock solid nfs connections?

PixilOP•2mo ago

shouldn't be, they're all managed by nfs-subdir in the same kubernetes cluster so they'd get mounted the same certainly can't guarantee there isn't a network issue, though even with an intermittent issue I'd expect it to eventually work, my understanding of a hard nfs mount is that it keeps retrying until it works the problem isn't currently happening, but I found some info on increasing nfs logging, so next time it happens I'll see if I can do that

IVPathfinder•2mo ago

Sweet 🙂 Or we can make a fork with minio support and help them merge it if it won't dirupt anything

PixilOP•2mo ago

fwiw, the full setup is two proxmox hosts, one has truenas scale as a vm passing through a pcie sas card

IVPathfinder•2mo ago

woo a rabbit hole setup xD

PixilOP•2mo ago

I guess I could give truenas more cpu to rule that out

IVPathfinder•2mo ago

what the network card? consumer or enterprise?

PixilOP•2mo ago

it's all enterprise, dell r730s don't remmeber the exact card but it's their 2x 1GB and 2x 10GB SPF+ cards

IVPathfinder•2mo ago

ahh then networking should be less likely a problem

PixilOP•2mo ago

using the 1GB links they plug directly into a ubiquiti switch

IVPathfinder•2mo ago

ahh shit ubiquiti is a big issue, frequent dropouts for me.. first hand experience their non full size switches are really bad with trunking and vlans like to rstp loop themselves to hell

PixilOP•2mo ago

if I have some time this weekend, I can try swapping it out with an older netgear I have

IVPathfinder•2mo ago

For the ubiquiti fault the fingerprints are a period of super stable connection ~1-2 months. Then you can't ping anything on one-more clients on one side of the network, unifi console seems fine, but you trigger resets of the switches, then it all turns to shit and you have to factory reset the switches, and the logs don't help much, then the console says loop protection stuff.

PixilOP•2mo ago

I haven't seen anything like that in the last couple years I've been running it

IVPathfinder•2mo ago

yeah i think this is isolated to their smaller size switches since linus tech tips would have said something

PixilOP•2mo ago

this is a 24 port poe switch don't remember exactly which one the older one that still had fans I'm really wondering if it's cpu contention though, I know nfs wants 1 thread per cpu you have and I only have 4 cores on the vm I thought I had 12

PixilOP•2mo ago

IVPathfinder•2mo ago

I can't find anything about nfs system requirements on reputable websites but this might be useful in the future when it locks up

IVPathfinder•2mo ago

https://documentation.ubuntu.com/server/how-to/networking/install-nfs/index.html

Ubuntu Server

Network File System (NFS)

NFS allows a system to share directories and files with others over a network. By using NFS, users and programs can access files on remote systems almost as if they were local files. Some of the mo...

PixilOP•2mo ago

truenas defaults to 1 per core, I've never tried going over that but I have ~40 unused cores left on that host so I'm happy to up it just to be safe

IVPathfinder•2mo ago

https://tenor.com/view/chad-flex-chadflex-gif-502800778430593705

Tenor

IVPathfinder•2mo ago

i recall some disk io bottlenecks related to backup processes running at the same time as immich, the storage template migration slows down considerably but gets there eventually. borgmatic was going ham xD

PixilOP•2mo ago

more info, it hung again on the first upload after increasing the cpu cores for truenas, so I guess that wasn't helpful checking nfsstat -c on the vm running immich I see rename operations incrementing pretty fast to try eliminating everything else I tainted the node and evicted everything else, sadly immich restarted in that process and of course got unstuck. If it happens again soon hopefully I can get some better data That was fast, it's stuck again and is the only thing of note running on the node renames are incrementing relatively fast again I have a feeling it's stuck trying to rename an image when it can't because it wants to rename across mount points umm, I guess all the renames are coming from this:

PixilOP•2mo ago

message.txt

PixilOP•2mo ago

it's renaming all my files with a duplicate extension which feels wrong but maybe unrelated(?)

Zeus•2mo ago

Those seem like a bug but not the cause of the slowness because they would likely be new in 1.130

PixilOP•2mo ago

should I write up an issue for it?

Zeus•2mo ago

Are the files actually being renamed? Can you show the database entry of one of them?

PixilOP•2mo ago

Here's the folder

message.txt

PixilOP•2mo ago

some definitely are, others not

Zeus•2mo ago

can you get the asset id for one of them and run in the database select * from assets where id = 'idhere'; https://immich.app/docs/guides/database-queries/

PixilOP•2mo ago

here's the query

message.txt

Zeus•2mo ago

can you open an issue with these details please

PixilOP•2mo ago

sure thing

Zeus•2mo ago

It looks like there is an issue with all the live photos that have both a video and photo component

PixilOP•2mo ago

https://github.com/immich-app/immich/issues/17176 on a positive note, if I have to reupload all of my images to fix this, it will mean I likely trigger the nfs hang again so I can troubleshoot that easier Looks like my issue got closed before I could provide evidence that it isn't limited to just immich-go uploads. I'm not sure if closed issues get monitored, but I provided a screenshot of a more recent image uploaded via iOS with the same problem. Can someone with permission to, please reopen the issue?

Gaming

Programming

Storage Template Migration gets stuck occasionally

Did you find this page helpful?