Image upload not complete using CLI
The image upload is not complete when using the CLI to upload.
As you can see the CLI stops uploading after the 2903 images are uploaded, as expected. However when starting the CLI again, it still finds the 2903 assets that don't match
Online I can see that I have
21445
assets uploaded, in the output folder there are 24346
which means exactly 2901
photos failed to upload which still doesn't match the 2903
exactly but get's me closer66 Replies
I also now used
-r
as a CLI option instead of the depreacted -d
which yields the same results
These are the filetypes in the directory:
When monitoring the output of the cli command the files not being uploaded seem random to me.I am having the same issue ...
anyone else having this issue?
@Alex Any ideas
What issue are you seeing?
there are 30477 local assets and it will upload 2831 because other have been uploaded
It would be nice to also check the logs of the server to see what happen during the upload process as well
I wiped everything and will be uploading about 70K assets again, I will see if I can catch the logs this time
its like it won't upload the last 2831 no matter how many times I ran it
but no errors on the CLI side of things
Failed to upload 0 files []
@Alex which logs should I look at
it happened again
immich_server ?
there is nothing actively being logged in it as I am attempting the upload the second time.
this was the last log
Those errors can be ignore as it is the message when a video stream is terminated
Maybe those assets that are not accounted in the CLI is not supported by Immich
some are jpg
How many have been uploaded to Immich, do you know?
for the second run
Indexing local assets...
Indexing complete, found 40523 local assets
Comparing local assets with those on the Immich instance...
A total of 18009 assets will be uploaded to the server

so it says that it will upload 18009 to the instance
yes the second time I ran it
I suspect the other 2000s differences are from mobile upload?
I doubt i had 2000 before I started this import
What are other sources you have been using to upload to Immich?
mobile, but it was like 144
or somewhere around that
So what you are suspecting is it says it will upload 18000 files but somehow inflates to 20700 files?
well technically 22400 files including the videos
Indexing complete, found 40523 local assets
40523 - 20783 - 1749 = 17,991
the first run was on the full 40523
the second run in the same 40523 directory its attempting to upload 18000 that it "failed" or never told me it failed on in the first run
how many assets would it report if you run the CLI again?
its almost finished
and ill try it again
do jobs need to be finished for those server counts to be updated and for the CLI API to know
No it doesn't the files are uploaded to the server are the raw files, the server just handles generating additional files and then running through the ML pipeline
last 500 files apparently are a bunch of mp4's ...
taking a bit
console commands from run 2 and 3
@Alex so it ran through the whole 18009 the second time and still is like it didnt upload

nothing new in docker logs immich_server
I can pull it from git and put a log into the response if you know which file and line I need to put it in
i put in a log

its returning duplicate: true ?
logging out the localAssets that are part of the 18000
one for example has id of id: 'photo0026-edited_2.jpg-445926',
Hitting the api with the /api/asset/device id
Does not have that id in the array returned

@jrasm91 have you done any work on that asset api side to have any thoughts on this issue?
https://github.com/immich-app/immich/blob/6ce35d47f5582b89294578c691705727a184b996/server/apps/immich/src/api-v1/asset/asset.service.ts#LL137C12-L137C12
not sure if that method gets called when
url:
${endpoint}/asset/upload
,
gets called from the CLI scriptSorry what is the problem exactly?
Two things going on here I can help clarify on.
1. If the device asset id is in the list it will skip it. This basically means same filename, modified timestamp, etc. This is actually being replaced with a checksum hash instead. There is an open PR to change that already done. it will not try to upload it in this case.
2. A new asset is uploaded and a hash is calculated on the server. When it tries to save it to the database an error happens because the same hash already exists. We return duplicate: true, and the asset id that it matched with (along with a 200 status code instead of 201).
You have a filename that is supposedly a duplicate. You can run sha1sum <filename> locally. You can hit /api/asset/assetById/:assetId to get the details for the duplicate. You will see the same checksum (although it is base 64 url encoded) and the original filename, which you can probably also find locally and manually verify it is a duplicate.
On your # 1 the non PR version is not happening in my case, I hit the API endpoint -> https://photos.hidden.net/api/asset/MyDeviceUUIDhere
and I also logged out what new assets is the CLI command attempting to upload again and then searched the return API results for a couple of those file names and did not find them, so thats why its not skipping them on the CLI side.
It then tries to upload them but the API is counting them as duplicates, I am assuming its finding a file with the same hash, I am going to investigate this part to see why they have the same hash, is it a true duplicate or something else
Yup, that makes sense and that is exactly how it is configured to work.
This would imply the duplicates are two separate files with two different names. Or, the file was already uploaded by a different source (phone, etc.)
there is around 18K that its counting as duplicate
Have you verified any of them yet?
the console logged array looks a lot smaller than 18K, but a lot of them look like burst shots
nvm its truncating the array being logged
Did you look this one up?
is :assetId the filename+size ?
for the api you described above
No that's a guid
assetDeviceId is the filename
ok let me get the hash locally and then ill try hitting that api
What do you need the hash for?
I got to get a different example than what I screenshotted above as I am not sure what file name it was trying to upload there
I am assuming a bunch of those burst shots are either duplicates or may have the same hash
let me see if I can find some non burst shots
What is the filename it tried to upload and what is the filename of the already uploaded one on the server?
COVER.jpg vs COVER_1.jpg? Looks like a duplicate copy to me

This is from google takeout
I dont see this in photos.google.com
but when exported its doing this crap
In google photos you can edit photos and it save over the original, so only the edited one shows up. I thinkn that's unrelated to this duplicate problem though. Those jpgs are presumably different version (sha1).
Is this the file it tried to upload?
What is the original filename for this asset?
689ca022-6047-4b13-8b94-48804eec3606
"originalFileName": "00000IMG_00000_BURST20190226115815827_COVER",
here is some names of the other ones that its trying to upload
I bet those 123_1 / 123_2 are the same file not modified
but somehow google duplicated for some reason
Do you have two files locally?
Where does
come from?
top box is original export, bottom is after I ran the exif script
and is what I am attempting to upload
Looks like they're duplicates.
I mean, it looks like everything is working as expected. CLI doesn't re-upload the same file (name + modified date) again, and it correctly detects an uploaded duplicate and doesn't re-upload it.
Well, it re-uploads, but doesn't re-add it.
I think you are correct, ill just have to run through and verify a bunch
It's a bit of a rabbit-hole trying to figure out why you have duplicates in the first place though 😛
Yea and google takeout is a pain
I need to build a google photos api exporter
I need to do that too haha
@Quadrubo See the fun conversation above, may be the same issue you are having
oh okay I see so i probably have duplicates then. Any plan to show that in the cli as feedback, e.g. 2000 images where not uploaded because they are duplicate
I could maybe do a PR for that
i'll look at it tonight, since the res already contains duplicate: true, should be pretty easy to add

spelling error
but how does that look
actually I am going to add an option to write the duplicate file names to a file
apparently writing a file in node is apparently not straight forward....
ill look at it some more in the morning lol and do a PR
You should be able to do fs.writeFileSync(path to file, files.join('\n')) or similar
I tried writeFile / createWriteStream, and finally worked with writeFileSync lol

GitHub
Duplicate counts and write to file by adoreparler · Pull Request #9...
Added functionality to count how many files were reported as duplicates after attempting to upload to the server. Added functionality to write those duplicate file names to a file
Commit added, not sure if I need to do anything in this pull request to get them to be added. They show in the commits list.What do you mean by this?
I was not sure if I had to do anything to the PR after i committed the second time
but I think I figured it out lol
Yeah you can continue to add commits and they are added to the pr
Hello there, I found this thread because I was having the same problem. What appears to be a large number of assets are not uploaded (no matter how many passes of the directory are made). I am assuming they are duplicates but the number is quite high and so I would like to inspect the files. I am not quite sure how to get a list, this PR looks to fix this but in the mean time is there a method to find these files without this code being released?
Pull from my fork and use that?
The obvious choice hehe thanks ill give it a try.