Storage Template Migration gets stuck occasionally
I've found that my storage template migration will occasionally get stuck and I have the restart the container to get it to resume.
96 Replies
:wave: Hey @Pixil,
Thanks for reaching out to us. Please carefully read this message and follow the recommended actions. This will help us be more effective in our support effort and leave more time for building Immich :immich:.
References
- Container Logs:
docker compose logs
docs
- Container Status: docker ps -a
docs
- Reverse Proxy: https://immich.app/docs/administration/reverse-proxy
- Code Formatting https://support.discord.com/hc/en-us/articles/210298617-Markdown-Text-101-Chat-Formatting-Bold-Italic-Underline#h_01GY0DAKGXDEHE263BCAYEGFJAChecklist
I have...
1. :ballot_box_with_check: verified I'm on the latest release(note that mobile app releases may take some time).
2. :ballot_box_with_check: read applicable release notes.
3. :ballot_box_with_check: reviewed the FAQs for known issues.
4. :ballot_box_with_check: reviewed Github for known issues.
5. :ballot_box_with_check: tried accessing Immich via local ip (without a custom reverse proxy).
6. :ballot_box_with_check: uploaded the relevant information (see below).
7. :ballot_box_with_check: tried an incognito window, disabled extensions, cleared mobile app cache, logged out and back in, different browsers, etc. as applicable
(an item can be marked as "complete" by reacting with the appropriate number)
Information
In order to be able to effectively help you, we need you to provide clear information to show what the problem is. The exact details needed vary per case, but here is a list of things to consider:
- Your docker-compose.yml and .env files.
- Logs from all the containers and their status (see above).
- All the troubleshooting steps you've tried so far.
- Any recent changes you've made to Immich or your system.
- Details about your system (both software/OS and hardware).
- Details about your storage (filesystems, type of disks, output of commands like
fdisk -l
and df -h
).
- The version of the Immich server, mobile app, and other relevant pieces.
- Any other information that you think might be relevant.
Please paste files and logs with proper code formatting, and especially avoid blurry screenshots.
Without the right information we can't work out what the problem is. Help us help you ;)
If this ticket can be closed you can use the /close
command, and re-open it later if needed.GitHub
immich-app immich Ā· Discussions
Explore the GitHub Discussions forum for immich-app immich. Discuss code, ask questions & collaborate with the developer community.
FAQ | Immich
User
GitHub
Issues Ā· immich-app/immich
High performance self-hosted photo and video management solution. - Issues Ā· immich-app/immich
deployment yaml
Here is a section of the logs where it stopped moving files
and here is what the GUI looks like, I currently have the number of jobs set to 1

I'm also storing my images on an nfs mount, which could be a cause of intermittent problems
otherwise this is a brand new deployment as of last night because I wanted to try switching over to cnpg
Successfully submitted, a tag has been added to inform contributors. :white_check_mark:
@Daniel I ran into the problem of my storage template getting stuck again. Figured I'd revive this old post I made to keep it more organized.
This time it was uploading a single photo
it's been stuck for about 2 hours now
Not exactly sure why you're pinging me tbh š
Are there any logs?
sorry, you had answered me a couple days ago
and I have logs this time
I'm getting them now, but apparently my echoing of them to a file broke all the escape codes
Everyone is all over the place replying to people, then someone else takes over because you're off, ...
Here are a bunch of logs for the last two hours. If you have time to look at it with me that would be great, other wise I can wait for someone else
I'm assuming the two POSTS towards the top of the logs are the image being uploaded
Then at 2:00:58 there is this part:
which looks pretty standard to me
You being on the jobs page the whole time is kind of annoying lmao
So many requests
yeah, I realized that after the fact
This is a good shout
That's your issue
Why it's DEBUG only, idk lol
IMO that should be a warning
I get that no matter what, even when it works
Which file system are you on?
It's using NFS
backed by a truenas zfs pool
Hm ok since it's between mount points it probably cannot make a move but needs to copy instead, so this does make sense
The copy should work though
You said it does work for some files though?
yeah, it usually works
and then sometimes it hangs until I restart the pod
Here's a snippet from the 14th
It seems like something in that move from 2pm is hanging indefinitely and probably just needs to timeout and try again
Is it possible that NFS mount is a bit flaky?
I mean, it's nfs so of course
Well yeah, but beyond the usual š
but it's been pretty rock solid for everything else
Hmk
it is using a hard mount so it will block until the request returns if my understand my nfs correctly
So Immich getting stuck on that would make a lot of sense
And restarting the pod kills the fs call
If immich is assuming that it will always return
We're awaiting the move, yeah
We have a bunch of logic around file moves (including an extra table) to make sure moves actually execute
I've gotten somewhat familiar with that table having dealt with this for awhile
(since otherwise we'd just lose them which would be bad)
That makes sense
Yeah
Honestly this is beyond my expertise, but my best hunch is that move getting stuck and Immich waiting on it forever
Why that gets stuck?
ĀÆ\_(ć)_/ĀÆ
I don't know why it would just hang though, there wasn't an outage, but I assume it's just general flakiness with nfs
On the other hand I know many people who run their library on NFS just fine, idk
Let me ask around if people are more experienced with NFS
I don't know what the default mount type is for nfs in linux. I'm using the nfs-subdir-external-provisioner in kubernetes which defaults to a hard mount
and in particular I need the hard mount for some other services that are using it
I think in general it makes a lot of sense to have that hard mount
The issue isn't that you have configured that; the issue is that the move gets stuck š
(presumably)
yeah š
if it's any incentive, this is the last thing holding me back from moving fully to Immich and buying a server license
It most definitely is not ;)
But I asked the team if there are any ideas :)
Before that
awesome, thanks for the help with this
FYI, this has been shared with the support crew and hopefully someone who knows more about this will be able to help.
doing a bit more research, it looks like hanging indefinitely might be the right thing, (not necessarily for immich, but for a hard nfs mount). It appears there should be some kernel level errors being generated on every attempt that I assume are getting eaten somewhere
Just writing the same thing again :peepoNotes:
Well done @Nicholas Flamy :P
if we could at least surface those kernel errors it would help a lot in troubleshooting
Maybe on the remote?
It should be on the client side
presumably the request isn't making it through so it would have an error like
kernel: nfs: server nfsserver not responding, timed out
The only relevant thing I see in the nfsd logs on the server are
but that hasn't happened since the 12th
Here is the exact mount that is being used in the pod
Checking in on this, anyone on the support team have any thoughts? I kind of assume they'd have shared them already, but I'm being optimistic today.We are not sure yet. Have you tried set the concurrency level back to default values?
We've already established that this is just an NFS issue that may be fixed with some config options.
But apparently we don't have NFS experts š
that's unfortunate, any chance we can add more logging around it?
I think this is something to do with file permissions on the nfs serverside?
I've only just skimmed like 2% of your thread here
Here's my config I use on my nfs server for a random client
I think make sure the file permissions on the server are correct as well
chmod -R 99:100 yoloinsecurefolder
you really need a rock solid connection. nfs really does like to hang like a *&!@ if the connection drops off.
A vlan change, switch unplug, new client causing dhcp server to drop off and take down the whole network. etc
Be nice on my pr pls xD I'm trying to get a job in the futureI'm not sure why that's in reply to this message :D
you must be overwhelmed at times with all the chatter. the bot on github marked you as the reviewer lol. I start to get german russian when busy.
You trilingual?
I don't think it's a permissions issue. My setup works for everything else in the k8s cluster and in fact works ~99% of the time for Immich. Occasionally, the storage template job tries to copy a file and seems to hang indefinitely waiting for the copy to return.
I wish š
Network? I'd love to know a surefire way you could confirm/deny this one. I had a similar nfs hang problem in the past so decided eat the performance loss and use cifs smb instead.
Immich does have a plethora of files? Is this mount different from any of your other rock solid nfs connections?
shouldn't be, they're all managed by nfs-subdir in the same kubernetes cluster
so they'd get mounted the same
certainly can't guarantee there isn't a network issue, though even with an intermittent issue I'd expect it to eventually work, my understanding of a hard nfs mount is that it keeps retrying until it works
the problem isn't currently happening, but I found some info on increasing nfs logging, so next time it happens I'll see if I can do that
Sweet š Or we can make a fork with minio support and help them merge it if it won't dirupt anything
fwiw, the full setup is two proxmox hosts, one has truenas scale as a vm passing through a pcie sas card
woo a rabbit hole setup xD
I guess I could give truenas more cpu to rule that out
what the network card? consumer or enterprise?
it's all enterprise, dell r730s
don't remmeber the exact card but it's their 2x 1GB and 2x 10GB SPF+ cards
ahh then networking should be less likely a problem
using the 1GB links
they plug directly into a ubiquiti switch
ahh shit
ubiquiti is a big issue, frequent dropouts for me.. first hand experience their non full size switches are really bad with trunking and vlans
like to rstp loop themselves to hell
if I have some time this weekend, I can try swapping it out with an older netgear I have
For the ubiquiti fault the fingerprints are a period of super stable connection ~1-2 months. Then you can't ping anything on one-more clients on one side of the network, unifi console seems fine, but you trigger resets of the switches, then it all turns to shit and you have to factory reset the switches, and the logs don't help much, then the console says loop protection stuff.
I haven't seen anything like that in the last couple years I've been running it
yeah i think this is isolated to their smaller size switches
since linus tech tips would have said something
this is a 24 port poe switch
don't remember exactly which one
the older one that still had fans
I'm really wondering if it's cpu contention though, I know nfs wants 1 thread per cpu you have and I only have 4 cores on the vm
I thought I had 12

I can't find anything about nfs system requirements on reputable websites
but this might be useful in the future when it locks up

Ubuntu Server
Network File System (NFS)
NFS allows a system to share directories and files with others over a network. By using NFS, users and programs can access files on remote systems almost as if they were local files. Some of the mo...
truenas defaults to 1 per core, I've never tried going over that
but I have ~40 unused cores left on that host so I'm happy to up it just to be safe
i recall some disk io bottlenecks related to backup processes running at the same time as immich, the storage template migration slows down considerably but gets there eventually.
borgmatic was going ham xD
more info, it hung again on the first upload after increasing the cpu cores for truenas, so I guess that wasn't helpful
checking
nfsstat -c
on the vm running immich I see rename operations incrementing pretty fast
to try eliminating everything else I tainted the node and evicted everything else, sadly immich restarted in that process and of course got unstuck. If it happens again soon hopefully I can get some better data
That was fast, it's stuck again and is the only thing of note running on the node
renames are incrementing relatively fast again
I have a feeling it's stuck trying to rename an image when it can't because it wants to rename across mount points
umm, I guess all the renames are coming from this:it's renaming all my files with a duplicate extension which feels wrong but maybe unrelated(?)
Those seem like a bug but not the cause of the slowness because they would likely be new in 1.130
should I write up an issue for it?
Are the files actually being renamed? Can you show the database entry of one of them?
Here's the folder
some definitely are, others not
can you get the asset id for one of them and run in the database
select * from assets where id = 'idhere';
https://immich.app/docs/guides/database-queries/here's the query
can you open an issue with these details please
sure thing
It looks like there is an issue with all the live photos that have both a video and photo component
https://github.com/immich-app/immich/issues/17176
on a positive note, if I have to reupload all of my images to fix this, it will mean I likely trigger the nfs hang again so I can troubleshoot that easier
Looks like my issue got closed before I could provide evidence that it isn't limited to just immich-go uploads. I'm not sure if closed issues get monitored, but I provided a screenshot of a more recent image uploaded via iOS with the same problem. Can someone with permission to, please reopen the issue?