Import CLI Dedupe Logic Seems Broken

Hey, While batch importing a bunch of old vacation photos I noticed some wonkiness with the CLI import tool. Running the reccomened: sudo docker run -it --rm -v "$(pwd)":/import --network=photo-net ghcr.io/immich-app/immich-cli:latest upload --key "$myKey" --server "$myServer" I managed to import 1269 assets. However, rerruning the same coomand on the same folder yields:
hecking connectivity with Immich instance...
Server status: OK
Checking credentials...
Login status: OK
Successful authentication for user xxxxxxx
Indexing local assets...
Indexing complete, found 1269 local assets
Comparing local assets with those on the Immich instance...
A total of 218 assets will be uploaded to the server
Do you want to start upload now? (y/n) y
hecking connectivity with Immich instance...
Server status: OK
Checking credentials...
Login status: OK
Successful authentication for user xxxxxxx
Indexing local assets...
Indexing complete, found 1269 local assets
Comparing local assets with those on the Immich instance...
A total of 218 assets will be uploaded to the server
Do you want to start upload now? (y/n) y
It's always 218 assets that it thinks are new, no matter how many times the import runs, there are no error codes that I can see. Also worth note is that the immich service backend seems to correctly dedupe as I don't get a bunch of replicas. Looking through the progress bar I pick up one of the photos: IMG_20160902_105100.jpg, If I go to my immich instantce and search for m:IMG_20160902_105100.jpg the image shows up, if I go to the immich instance and try and re-upload IMG_20160902_105100.jpg it says skipped ( expectedly ), but the CLI does not dedupe it. Whats even more strange, is that if I go to my immich instnace and delete IMG_20160902_105100.jpg and rerun the CLI, now it will re-import it and the total count will be reduced to 217. Is there some funky metadata mismatch happening here? Thanks in advance for the help
10 Replies
Alex Tran
Alex Tran2y ago
The CLI doesn’t have persistent storage, so it will reupload assets that are not on the server i.e duplicated in this case to the server and the server will reject those files
Unrealmaster
UnrealmasterOP2y ago
I see, but then why does it somehow see the same ~1k assets as the server in the normal case. The import directory is not in any of the immich instance mount points.
Alex Tran
Alex Tran2y ago
So the CLI found 1269k assets that can be ingested, then it managed to successfully upload about 1000 assets while 218 other assets are duplicated so they are ignored. is that anaswer your question?
Unrealmaster
UnrealmasterOP2y ago
Hey Alex, its the other way around. The CLI found 1269 asserts, it somehow correctly detects that ~1k assests are duplicated and does not try and upload them. The other 218 are detected as new even though they are duplicates. When the command runs the server detects that the assets are dups and rejects them. My question is two fold: 1. Why does the CLI correctly detect that some assets are dups but not others? 2. Why, when I delete one of the 218 assets on the actual server, rerun the CLI to upload the image again, then re-run the CLI again does it start detecting that that asset is a dup and reduce the "new count" to 217
etnoy
etnoy2y ago
You can try the new beta CLI, it's in the CLI folder of the git repo
vakulenchuk
vakulenchuk2y ago
I am also struggling with this issue, I found more discussion here https://discord.com/channels/979116623879368755/1109830963065786378 since this duplicate matching enhancement has not been released I resorted to running a script that compares the two directories to see if these are in fact duplicates. (and provide a path/filename)
Unrealmaster
UnrealmasterOP2y ago
@etnoy The main immich repo or the immich-cli repo?
etnoy
etnoy2y ago
Main repo
Unrealmaster
UnrealmasterOP2y ago
Ok going to give that a shot and report back @etnoy
> ts-node cli/src upload -n /media/import/Photos\ from\ 2016
████████████████████████████████████████ | 100% | ETA: 0s | 6.7 GB/6.7 GB: /media/import/Photos from 2016/...
All assets were already uploaded, nothing to do.
> ts-node cli/src upload -n /media/import/Photos\ from\ 2020
████████████████████████████████████████ | 100% | ETA: 0s | 9.1 GB/9.1 GB: /media/import/Photos from 2020/...
All assets were already uploaded, nothing to do.
> ts-node cli/src upload -n /media/import/Photos\ from\ 2016
████████████████████████████████████████ | 100% | ETA: 0s | 6.7 GB/6.7 GB: /media/import/Photos from 2016/...
All assets were already uploaded, nothing to do.
> ts-node cli/src upload -n /media/import/Photos\ from\ 2020
████████████████████████████████████████ | 100% | ETA: 0s | 9.1 GB/9.1 GB: /media/import/Photos from 2020/...
All assets were already uploaded, nothing to do.
Ok looks like the beta CLI is correctly working. Thanks for the suggestion! One thing though, ts-node cli/src logout does not exist 😛
etnoy
etnoy2y ago
We have some things to work on before officially releasing 😅

Did you find this page helpful?