CLI batch upload stats
I noticed that whenever I upload the same directory multiple times using the CLI tool, it always gives me a non-zero diff on subsequent runs. For example:
But then the upload progresses very quickly and the total asset count reported by the server doesn't change, leading me to believe that either certain uploads fail OR they all succeed but the reported diff is wrong.
postgres logs indicate the latter explanation:
But then again, I'm not sure if these logs originate from the 493 assets or from the (4005-493) which are already on the server.
Does anyone have a better understanding of this? Thanks!
13 Replies
since the CLI client doesn't store states so it doesn't know which assets are existed on the server, it will reupload duplicated assets and the server will reject them
then what does the reported difference represent?
represent files that are not on the server OR file that are on the server but duplicated interm of file content
not sure I understand this
4005 is the total local asset count, which was already uploaded to the server in a previous run
so subsequent runs should report either 0 files to be uploaded, or 4005 again
0 if the CLI can compute diffs, 4005 if not
Correct, but within those 4005 files there are duplicated file (same file's content) with might be different filename
and it seems like there are 493 out of 4005 have duplicated version on the server already
ah so
len(set(local_files))) == 493
?not sure I understand the line, but the CLI client will first check for the file id if it presents on the server, then if it doesn't it will be put into the candidate list for upload
if two files have the same content but has different name, the file ID would be different, that is why after running the CLI the first time then run it again, you see those 493 files that in the candidate list
however, due to the file content is similar to an asset that was uploaded previously, the server calculate the hash of the incoming file and reject it
the CLI tool doesn't store the rejection information, so if you run the CLI again, it will put those file into the candidate list
does it make sense?
I think I got it
so 3512 files exist uniquely on the server
which the CLI doesn't even attempt to upload
The new CLI in the main repo has client side checking so these things won't happen and it has better message as well. We just haven't gotten around to release it yet
and 493 are "new" from the CLIs perspective but they turn out to have existing duplicates on the server
Yes, correct
cool, thanks
makes me a bit less weary about accidentally losing parts of my collection during migration
Looks like a bug in the progress bar, it reports the local total instead of the uploadable count
yeah we haven't put in much love into this CLI version since there was a new one underway
😄