Difference algorithm that Git uses
Hello! Quick rundown:
I have a need to detect changes in custom bytecode sequences. Essentially, I need to be able to compare two bytecode files (just consider them string sequences, lines of text even), and find:
1) Unchanged lines.
2) Deleted lines.
3) Inserted lines.
4) Moved lines.
Modified lines can be considered deleted and inserted. The key is I NEED to detect moved lines.
DiffPlex and Python's difflib both satisfy requirements 1, 2 and 3, but neither of them seems to be able to detect MOVED lines.
Git's difference algorithm seems to be able to detect moved lines. I tested it using --color-moved and indeed, it correctly identified the moved bytecode blocks, unlike DiffPlex and difflib that consider it deleted and inserted.
I believe Git uses the Myer's diff algorithm? I don't think the Myer's diff algorithm recognizes moved lines though, that must be some extra logic on Git's side of things.
I could come up with a custom algorithm for detecting the moved lines, like taking the deleted lines from A and finding matching sequences in B with LCS or something like that, but surely someone has already done this? This doesn't seem like an uncommon problem, so I would be surprised if there isn't an open-source, polished and tested solution already, but I haven't been able to find any.
C# is preferable, but it doesn't have to be C#, any language or tool will do. I would even consider parsing Git's diff output if I had no other options and if it was better structured.
Any input is appreciated!
3 Replies
doesn't git use an external tool for diffing?
afair it doesn't have anything built in
https://git-scm.com/docs/git-difftool
I think it just defaults to the gnu
diff
tool for ityou can use fc.exe /? with the /b parameter
to make a binary comparison
is preinstalled on windows
https://manual.winmerge.org/en/Compare_bin.html
you can use winmerge to compare with a gui
Didn't get a notif for this - thanks for the responses!