I
Immich•4mo ago
Pixil

Storage Template Migration gets stuck occasionally

I've found that my storage template migration will occasionally get stuck and I have the restart the container to get it to resume.
96 Replies
Immich
Immich•4mo ago
:wave: Hey @Pixil, Thanks for reaching out to us. Please carefully read this message and follow the recommended actions. This will help us be more effective in our support effort and leave more time for building Immich :immich:. References - Container Logs: docker compose logs docs - Container Status: docker ps -a docs - Reverse Proxy: https://immich.app/docs/administration/reverse-proxy - Code Formatting https://support.discord.com/hc/en-us/articles/210298617-Markdown-Text-101-Chat-Formatting-Bold-Italic-Underline#h_01GY0DAKGXDEHE263BCAYEGFJA
Immich
Immich•4mo ago
Checklist I have... 1. :ballot_box_with_check: verified I'm on the latest release(note that mobile app releases may take some time). 2. :ballot_box_with_check: read applicable release notes. 3. :ballot_box_with_check: reviewed the FAQs for known issues. 4. :ballot_box_with_check: reviewed Github for known issues. 5. :ballot_box_with_check: tried accessing Immich via local ip (without a custom reverse proxy). 6. :ballot_box_with_check: uploaded the relevant information (see below). 7. :ballot_box_with_check: tried an incognito window, disabled extensions, cleared mobile app cache, logged out and back in, different browsers, etc. as applicable (an item can be marked as "complete" by reacting with the appropriate number) Information In order to be able to effectively help you, we need you to provide clear information to show what the problem is. The exact details needed vary per case, but here is a list of things to consider: - Your docker-compose.yml and .env files. - Logs from all the containers and their status (see above). - All the troubleshooting steps you've tried so far. - Any recent changes you've made to Immich or your system. - Details about your system (both software/OS and hardware). - Details about your storage (filesystems, type of disks, output of commands like fdisk -l and df -h). - The version of the Immich server, mobile app, and other relevant pieces. - Any other information that you think might be relevant. Please paste files and logs with proper code formatting, and especially avoid blurry screenshots. Without the right information we can't work out what the problem is. Help us help you ;) If this ticket can be closed you can use the /close command, and re-open it later if needed.
GitHub
immich-app immich Ā· Discussions
Explore the GitHub Discussions forum for immich-app immich. Discuss code, ask questions & collaborate with the developer community.
GitHub
Issues Ā· immich-app/immich
High performance self-hosted photo and video management solution. - Issues Ā· immich-app/immich
Pixil
PixilOP•4mo ago
deployment yaml
Pixil
PixilOP•4mo ago
Pixil
PixilOP•4mo ago
apiVersion: postgresql.cnpg.io/v1
kind: Cluster
metadata:
name: immich-postgresql
namespace: photos
spec:
instances: 1
# https://github.com/tensorchord/cloudnative-pgvecto.rs
imageName: registry.lab.reisman.org/proxy.ghcr.io/tensorchord/cloudnative-pgvecto.rs:16-v0.3.0
postgresql:
shared_preload_libraries:
- "vectors.so"
bootstrap:
initdb:
database: immich
owner: immich
secret:
name: immich-postgresql-user
dataChecksums: true
postInitApplicationSQL:
- ALTER SYSTEM SET search_path TO "$user", public, vectors;
- CREATE EXTENSION IF NOT EXISTS cube
- CREATE EXTENSION IF NOT EXISTS earthdistance
- CREATE EXTENSION IF NOT EXISTS vectors
- GRANT USAGE ON SCHEMA vectors TO immich
- GRANT SELECT ON ALL TABLES IN SCHEMA vectors TO immich
storage:
storageClass: nfs-client-nobackup
size: 1Gi
---
apiVersion: v1
stringData:
password: ${secrets_apps_photos_immich_postgresqlUserPassword}
username: immich
kind: Secret
metadata:
name: immich-postgresql-user
namespace: photos
type: kubernetes.io/basic-auth
apiVersion: postgresql.cnpg.io/v1
kind: Cluster
metadata:
name: immich-postgresql
namespace: photos
spec:
instances: 1
# https://github.com/tensorchord/cloudnative-pgvecto.rs
imageName: registry.lab.reisman.org/proxy.ghcr.io/tensorchord/cloudnative-pgvecto.rs:16-v0.3.0
postgresql:
shared_preload_libraries:
- "vectors.so"
bootstrap:
initdb:
database: immich
owner: immich
secret:
name: immich-postgresql-user
dataChecksums: true
postInitApplicationSQL:
- ALTER SYSTEM SET search_path TO "$user", public, vectors;
- CREATE EXTENSION IF NOT EXISTS cube
- CREATE EXTENSION IF NOT EXISTS earthdistance
- CREATE EXTENSION IF NOT EXISTS vectors
- GRANT USAGE ON SCHEMA vectors TO immich
- GRANT SELECT ON ALL TABLES IN SCHEMA vectors TO immich
storage:
storageClass: nfs-client-nobackup
size: 1Gi
---
apiVersion: v1
stringData:
password: ${secrets_apps_photos_immich_postgresqlUserPassword}
username: immich
kind: Secret
metadata:
name: immich-postgresql-user
namespace: photos
type: kubernetes.io/basic-auth
Pixil
PixilOP•4mo ago
Here is a section of the logs where it stopped moving files
Pixil
PixilOP•4mo ago
and here is what the GUI looks like, I currently have the number of jobs set to 1
No description
Pixil
PixilOP•4mo ago
I'm also storing my images on an nfs mount, which could be a cause of intermittent problems otherwise this is a brand new deployment as of last night because I wanted to try switching over to cnpg
Immich
Immich•4mo ago
Successfully submitted, a tag has been added to inform contributors. :white_check_mark:
Pixil
PixilOP•2mo ago
@Daniel I ran into the problem of my storage template getting stuck again. Figured I'd revive this old post I made to keep it more organized. This time it was uploading a single photo it's been stuck for about 2 hours now
Daniel
Daniel•2mo ago
Not exactly sure why you're pinging me tbh šŸ˜… Are there any logs?
Pixil
PixilOP•2mo ago
sorry, you had answered me a couple days ago and I have logs this time I'm getting them now, but apparently my echoing of them to a file broke all the escape codes
Daniel
Daniel•2mo ago
Everyone is all over the place replying to people, then someone else takes over because you're off, ...
Pixil
PixilOP•2mo ago
Here are a bunch of logs for the last two hours. If you have time to look at it with me that would be great, other wise I can wait for someone else
Pixil
PixilOP•2mo ago
I'm assuming the two POSTS towards the top of the logs are the image being uploaded Then at 2:00:58 there is this part:
[Nest] 7 - 03/17/2025, 2:00:58 PM DEBUG [Microservices:APIKeyService] Attempting to rename file: upload/upload/df1dc8c3-0aa5-433b-86ca-c7dd959226c0/34/f2/34f292fa-6b1f-4dd1-8073-275e63949569.MOV => upload/library/aaron@reismanorg/2025/2025-03-16/IMG_4321.MOV
[Nest] 7 - 03/17/2025, 2:00:58 PM DEBUG [Microservices:APIKeyService] Unable to rename file. Falling back to copy, verify and delete
[Nest] 7 - 03/17/2025, 2:00:58 PM DEBUG [Microservices:APIKeyService] Attempting to rename file: upload/upload/df1dc8c3-0aa5-433b-86ca-c7dd959226c0/34/f2/34f292fa-6b1f-4dd1-8073-275e63949569.MOV => upload/library/aaron@reismanorg/2025/2025-03-16/IMG_4321.MOV
[Nest] 7 - 03/17/2025, 2:00:58 PM DEBUG [Microservices:APIKeyService] Unable to rename file. Falling back to copy, verify and delete
which looks pretty standard to me
Daniel
Daniel•2mo ago
You being on the jobs page the whole time is kind of annoying lmao So many requests
Pixil
PixilOP•2mo ago
yeah, I realized that after the fact
Daniel
Daniel•2mo ago
This is a good shout That's your issue Why it's DEBUG only, idk lol IMO that should be a warning
Pixil
PixilOP•2mo ago
I get that no matter what, even when it works
Daniel
Daniel•2mo ago
Which file system are you on?
Pixil
PixilOP•2mo ago
It's using NFS backed by a truenas zfs pool
Daniel
Daniel•2mo ago
Hm ok since it's between mount points it probably cannot make a move but needs to copy instead, so this does make sense The copy should work though You said it does work for some files though?
Pixil
PixilOP•2mo ago
yeah, it usually works and then sometimes it hangs until I restart the pod Here's a snippet from the 14th
Pixil
PixilOP•2mo ago
Pixil
PixilOP•2mo ago
It seems like something in that move from 2pm is hanging indefinitely and probably just needs to timeout and try again
Daniel
Daniel•2mo ago
Is it possible that NFS mount is a bit flaky?
Pixil
PixilOP•2mo ago
I mean, it's nfs so of course
Daniel
Daniel•2mo ago
Well yeah, but beyond the usual šŸ˜…
Pixil
PixilOP•2mo ago
but it's been pretty rock solid for everything else
Daniel
Daniel•2mo ago
Hmk
Pixil
PixilOP•2mo ago
it is using a hard mount so it will block until the request returns if my understand my nfs correctly
Daniel
Daniel•2mo ago
So Immich getting stuck on that would make a lot of sense And restarting the pod kills the fs call
Pixil
PixilOP•2mo ago
If immich is assuming that it will always return
Daniel
Daniel•2mo ago
We're awaiting the move, yeah We have a bunch of logic around file moves (including an extra table) to make sure moves actually execute
Pixil
PixilOP•2mo ago
I've gotten somewhat familiar with that table having dealt with this for awhile
Daniel
Daniel•2mo ago
(since otherwise we'd just lose them which would be bad) That makes sense
Pixil
PixilOP•2mo ago
hard or soft — Specifies whether the program using a file via an NFS connection should stop and wait (hard) for the server to come back online, if the host serving the exported file system is unavailable, or if it should report an error (soft).
hard or soft — Specifies whether the program using a file via an NFS connection should stop and wait (hard) for the server to come back online, if the host serving the exported file system is unavailable, or if it should report an error (soft).
Daniel
Daniel•2mo ago
Yeah Honestly this is beyond my expertise, but my best hunch is that move getting stuck and Immich waiting on it forever Why that gets stuck? ĀÆ\_(惄)_/ĀÆ
Pixil
PixilOP•2mo ago
I don't know why it would just hang though, there wasn't an outage, but I assume it's just general flakiness with nfs
Daniel
Daniel•2mo ago
On the other hand I know many people who run their library on NFS just fine, idk Let me ask around if people are more experienced with NFS
Pixil
PixilOP•2mo ago
I don't know what the default mount type is for nfs in linux. I'm using the nfs-subdir-external-provisioner in kubernetes which defaults to a hard mount and in particular I need the hard mount for some other services that are using it
Daniel
Daniel•2mo ago
I think in general it makes a lot of sense to have that hard mount The issue isn't that you have configured that; the issue is that the move gets stuck šŸ˜… (presumably)
Pixil
PixilOP•2mo ago
yeah šŸ™‚ if it's any incentive, this is the last thing holding me back from moving fully to Immich and buying a server license
Daniel
Daniel•2mo ago
It most definitely is not ;) But I asked the team if there are any ideas :) Before that
Pixil
PixilOP•2mo ago
awesome, thanks for the help with this
Nicholas Flamy
Nicholas Flamy•2mo ago
FYI, this has been shared with the support crew and hopefully someone who knows more about this will be able to help.
Pixil
PixilOP•2mo ago
doing a bit more research, it looks like hanging indefinitely might be the right thing, (not necessarily for immich, but for a hard nfs mount). It appears there should be some kernel level errors being generated on every attempt that I assume are getting eaten somewhere
Daniel
Daniel•2mo ago
Just writing the same thing again :peepoNotes: Well done @Nicholas Flamy :P
Pixil
PixilOP•2mo ago
if we could at least surface those kernel errors it would help a lot in troubleshooting
Daniel
Daniel•2mo ago
Maybe on the remote?
Pixil
PixilOP•2mo ago
It should be on the client side presumably the request isn't making it through so it would have an error like kernel: nfs: server nfsserver not responding, timed out The only relevant thing I see in the nfsd logs on the server are
Mar 12 22:25:33 truenas kernel: nfsd: too many open connections, consider increasing the number of threads
Mar 12 23:10:40 truenas kernel: nfsd: too many open connections, consider increasing the number of threads
Mar 12 22:25:33 truenas kernel: nfsd: too many open connections, consider increasing the number of threads
Mar 12 23:10:40 truenas kernel: nfsd: too many open connections, consider increasing the number of threads
but that hasn't happened since the 12th Here is the exact mount that is being used in the pod
192.168.1.10:/mnt/main/kubernetes/production/photos-immich-library on /usr/src/app/upload/library type nfs4 (rw,relatime,vers=4.2,rsize=1048576,wsize=1048576,namlen=255,hard,proto=tcp,timeo=600,retrans=2,sec=sys,clientaddr=192.168.1.31,local_lock=none,addr=192.168.1.10)
192.168.1.10:/mnt/main/kubernetes/production/photos-immich-library on /usr/src/app/upload/library type nfs4 (rw,relatime,vers=4.2,rsize=1048576,wsize=1048576,namlen=255,hard,proto=tcp,timeo=600,retrans=2,sec=sys,clientaddr=192.168.1.31,local_lock=none,addr=192.168.1.10)
Checking in on this, anyone on the support team have any thoughts? I kind of assume they'd have shared them already, but I'm being optimistic today.
Alex Tran
Alex Tran•2mo ago
We are not sure yet. Have you tried set the concurrency level back to default values?
Daniel
Daniel•2mo ago
We've already established that this is just an NFS issue that may be fixed with some config options. But apparently we don't have NFS experts šŸ˜…
Pixil
PixilOP•2mo ago
that's unfortunate, any chance we can add more logging around it?
IVPathfinder
IVPathfinder•2mo ago
I think this is something to do with file permissions on the nfs serverside? I've only just skimmed like 2% of your thread here Here's my config I use on my nfs server for a random client
10.1.X.X(sec=sys,rw,secure,anonuid=99,anongid=100,no_root_squash,subtree_check)
10.1.X.X(sec=sys,rw,secure,anonuid=99,anongid=100,no_root_squash,subtree_check)
I think make sure the file permissions on the server are correct as well chmod -R 99:100 yoloinsecurefolder you really need a rock solid connection. nfs really does like to hang like a *&!@ if the connection drops off. A vlan change, switch unplug, new client causing dhcp server to drop off and take down the whole network. etc Be nice on my pr pls xD I'm trying to get a job in the future
Daniel
Daniel•2mo ago
I'm not sure why that's in reply to this message :D
IVPathfinder
IVPathfinder•2mo ago
you must be overwhelmed at times with all the chatter. the bot on github marked you as the reviewer lol. I start to get german russian when busy.
Nicholas Flamy
Nicholas Flamy•2mo ago
You trilingual?
Pixil
PixilOP•2mo ago
I don't think it's a permissions issue. My setup works for everything else in the k8s cluster and in fact works ~99% of the time for Immich. Occasionally, the storage template job tries to copy a file and seems to hang indefinitely waiting for the copy to return.
IVPathfinder
IVPathfinder•2mo ago
I wish šŸ˜‚ Network? I'd love to know a surefire way you could confirm/deny this one. I had a similar nfs hang problem in the past so decided eat the performance loss and use cifs smb instead. Immich does have a plethora of files? Is this mount different from any of your other rock solid nfs connections?
Pixil
PixilOP•2mo ago
shouldn't be, they're all managed by nfs-subdir in the same kubernetes cluster so they'd get mounted the same certainly can't guarantee there isn't a network issue, though even with an intermittent issue I'd expect it to eventually work, my understanding of a hard nfs mount is that it keeps retrying until it works the problem isn't currently happening, but I found some info on increasing nfs logging, so next time it happens I'll see if I can do that
IVPathfinder
IVPathfinder•2mo ago
Sweet šŸ™‚ Or we can make a fork with minio support and help them merge it if it won't dirupt anything
Pixil
PixilOP•2mo ago
fwiw, the full setup is two proxmox hosts, one has truenas scale as a vm passing through a pcie sas card
IVPathfinder
IVPathfinder•2mo ago
woo a rabbit hole setup xD
Pixil
PixilOP•2mo ago
I guess I could give truenas more cpu to rule that out
IVPathfinder
IVPathfinder•2mo ago
what the network card? consumer or enterprise?
Pixil
PixilOP•2mo ago
it's all enterprise, dell r730s don't remmeber the exact card but it's their 2x 1GB and 2x 10GB SPF+ cards
IVPathfinder
IVPathfinder•2mo ago
ahh then networking should be less likely a problem
Pixil
PixilOP•2mo ago
using the 1GB links they plug directly into a ubiquiti switch
IVPathfinder
IVPathfinder•2mo ago
ahh shit ubiquiti is a big issue, frequent dropouts for me.. first hand experience their non full size switches are really bad with trunking and vlans like to rstp loop themselves to hell
Pixil
PixilOP•2mo ago
if I have some time this weekend, I can try swapping it out with an older netgear I have
IVPathfinder
IVPathfinder•2mo ago
For the ubiquiti fault the fingerprints are a period of super stable connection ~1-2 months. Then you can't ping anything on one-more clients on one side of the network, unifi console seems fine, but you trigger resets of the switches, then it all turns to shit and you have to factory reset the switches, and the logs don't help much, then the console says loop protection stuff.
Pixil
PixilOP•2mo ago
I haven't seen anything like that in the last couple years I've been running it
IVPathfinder
IVPathfinder•2mo ago
yeah i think this is isolated to their smaller size switches since linus tech tips would have said something
Pixil
PixilOP•2mo ago
this is a 24 port poe switch don't remember exactly which one the older one that still had fans I'm really wondering if it's cpu contention though, I know nfs wants 1 thread per cpu you have and I only have 4 cores on the vm I thought I had 12
Pixil
PixilOP•2mo ago
No description
IVPathfinder
IVPathfinder•2mo ago
I can't find anything about nfs system requirements on reputable websites but this might be useful in the future when it locks up
IVPathfinder
IVPathfinder•2mo ago
No description
IVPathfinder
IVPathfinder•2mo ago
Ubuntu Server
Network File System (NFS)
NFS allows a system to share directories and files with others over a network. By using NFS, users and programs can access files on remote systems almost as if they were local files. Some of the mo...
Pixil
PixilOP•2mo ago
truenas defaults to 1 per core, I've never tried going over that but I have ~40 unused cores left on that host so I'm happy to up it just to be safe
IVPathfinder
IVPathfinder•2mo ago
i recall some disk io bottlenecks related to backup processes running at the same time as immich, the storage template migration slows down considerably but gets there eventually. borgmatic was going ham xD
Pixil
PixilOP•2mo ago
more info, it hung again on the first upload after increasing the cpu cores for truenas, so I guess that wasn't helpful checking nfsstat -c on the vm running immich I see rename operations incrementing pretty fast to try eliminating everything else I tainted the node and evicted everything else, sadly immich restarted in that process and of course got unstuck. If it happens again soon hopefully I can get some better data That was fast, it's stuck again and is the only thing of note running on the node renames are incrementing relatively fast again I have a feeling it's stuck trying to rename an image when it can't because it wants to rename across mount points umm, I guess all the renames are coming from this:
Pixil
PixilOP•2mo ago
Pixil
PixilOP•2mo ago
it's renaming all my files with a duplicate extension which feels wrong but maybe unrelated(?)
Zeus
Zeus•2mo ago
Those seem like a bug but not the cause of the slowness because they would likely be new in 1.130
Pixil
PixilOP•2mo ago
should I write up an issue for it?
Zeus
Zeus•2mo ago
Are the files actually being renamed? Can you show the database entry of one of them?
Pixil
PixilOP•2mo ago
Here's the folder
Pixil
PixilOP•2mo ago
some definitely are, others not
Zeus
Zeus•2mo ago
can you get the asset id for one of them and run in the database select * from assets where id = 'idhere'; https://immich.app/docs/guides/database-queries/
Pixil
PixilOP•2mo ago
here's the query
Zeus
Zeus•2mo ago
can you open an issue with these details please
Pixil
PixilOP•2mo ago
sure thing
Zeus
Zeus•2mo ago
It looks like there is an issue with all the live photos that have both a video and photo component
Pixil
PixilOP•2mo ago
https://github.com/immich-app/immich/issues/17176 on a positive note, if I have to reupload all of my images to fix this, it will mean I likely trigger the nfs hang again so I can troubleshoot that easier Looks like my issue got closed before I could provide evidence that it isn't limited to just immich-go uploads. I'm not sure if closed issues get monitored, but I provided a screenshot of a more recent image uploaded via iOS with the same problem. Can someone with permission to, please reopen the issue?

Did you find this page helpful?