The Mist
✅ :white_check_mark: Dictionary<int, ...> lookup time seems a bit too slow
I was profiling a program, and I saw this:
Here is the code in question:
136ms for 190,000 calls is translates to about 1,400 lookups per millisecond. That seems a little slow to me, is it not? If it is, why could this be? I figured looking up an int in a hashmap should be faster than that.
17 replies
Difference algorithm that Git uses
Hello! Quick rundown:
I have a need to detect changes in custom bytecode sequences. Essentially, I need to be able to compare two bytecode files (just consider them string sequences, lines of text even), and find:
1) Unchanged lines.
2) Deleted lines.
3) Inserted lines.
4) Moved lines.
Modified lines can be considered deleted and inserted. The key is I NEED to detect moved lines.
DiffPlex and Python's difflib both satisfy requirements 1, 2 and 3, but neither of them seems to be able to detect MOVED lines.
Git's difference algorithm seems to be able to detect moved lines. I tested it using --color-moved and indeed, it correctly identified the moved bytecode blocks, unlike DiffPlex and difflib that consider it deleted and inserted.
I believe Git uses the Myer's diff algorithm? I don't think the Myer's diff algorithm recognizes moved lines though, that must be some extra logic on Git's side of things.
I could come up with a custom algorithm for detecting the moved lines, like taking the deleted lines from A and finding matching sequences in B with LCS or something like that, but surely someone has already done this? This doesn't seem like an uncommon problem, so I would be surprised if there isn't an open-source, polished and tested solution already, but I haven't been able to find any.
C# is preferable, but it doesn't have to be C#, any language or tool will do. I would even consider parsing Git's diff output if I had no other options and if it was better structured.
Any input is appreciated!
10 replies