✅ Getting duplicates out of a list
Hey, I have a C# code where I have a list of byte arrays (which are extracted from Images of a folder and I don‘t know how many there are - ps. They are too many - the code I wrote the Program in was in Windows forms) and there are duplicates in it, but I don’t know how get the duplicates out of the list dynamically. I couldn’t find a really good solution that does what I want. Thanks for help!
36 Replies
What are you considering a duplicate? If you have the exact same image visually, but at a different resolution, is that a duplicate? How about the same image in two different formats?
nonono, what I meant with that is the byte array itself. The byte arrays are sometimes the same which means everything of some images is the same, isn´t it?
Sure, in that specific case. But if the goal is to detect duplicate images, then that kind of comparison will miss a lot of stuff.
Yes i know but the code will be larger by that, and this converter should be simple - do you know a solution?
There are a ton of different methods for finding duplicate images, they're not exactly simple though
Stack Overflow
Algorithm to compare two images
Given two different image files (in whatever format I choose), I need to write a program to predict the chance if one being the illegal copy of another. The author of the copy may do stuff like rot...
ok, if that is so, then I would have to do it complicated too I think..
your approach would only really work if the images are genuinely identical
like 1:1 copy and paste
and even then i'm not sure (what about time stamps?)
They are, that is the point
that is why I need help, is there an easy way for that.
so you have a
List<byte[]>
?Yes
that won't entirely be enough, you have the file names somewhere?
unless you only care about the indices
Yes I have them
Isn´t the hash code normally enough?
ero
REPL Result: Success
Result: int
Compile: 293.479ms | Execution: 54.281ms | React with ❌ to remove this embed.
no :p
or, to be more clear:
ero
REPL Result: Success
Result: ValueTuple<int, int>
Compile: 364.339ms | Execution: 27.222ms | React with ❌ to remove this embed.
:ReallyMad:
i doubt you can compare arrays with their hashcode
ok sorry, I don´t understand the topic that much, i am trying to understand it.
they're reference types, but i don't know their hashcode implementation
k
how would you do it if it was simple?
I mean filter out the byte arrays (duplicates)
really good question honestly
i mean, it's not hard or anything
just hard to get right
the easiest and slowest way would of course be a nested loop, comparing each item one by one with all other items
you would use
SequenceEquals
on the arrays to make sure they're identicalBut that is time consuming and also power consuming if they are many images
are you doing this more than once?
No
so?
isn´t hashset the thing that does that better?
hashset still needs a comparer
Ohhhhhhh, that is why it didn´t work. Now I understand, thanks
I tested things bevore I wrote here
ok, what else is possible instead of sequence equal, or is there a method to dynamically compare two bytes with Sequence equals from my list?
this should return an enumerable of duplicate indices
here's some fixed code
Ok thanks for help!
the problem with hashcode not being enough to implement is that the hashcode can still contain collisions
it's meant as a pre-check before doing the actual equality check
👍🏻 k
if the hashcodes already don't line up, we don't need to check for equality
there's certainly many ways to optimize this, but that's a topic for #allow-unsafe-blocks
$close
If you have no further questions, please use /close to mark the forum thread as answered