Testing Coding Plagiarisms C# CPP Java
So im making a website in ASP.NET that creates tests for coding questions and need to test plagiarized inputs, i've made a good one for c# but i still need one for Java and CPP and couldn't manage to do so for a while.
Can you reccomend any lib or something i can use for other languages? Mostly to tokenize the code.
I've also tried using ANTLR4 but i couldn't figure it much.
My work so far: https://github.com/Whiteboy92/TestPlagiarismCA
GitHub
GitHub - Whiteboy92/TestPlagiarismCA
Contribute to Whiteboy92/TestPlagiarismCA development by creating an account on GitHub.
12 Replies
Try to create an abstraction from your C# one that you can use for the other languages without rewriting too much of the same logic
but the c# one uses ms.codeanalysis which is specific to c# so idk
How are you determining if code is plagiarized in the end? I expect there would be a common input to that logic regardless of the language
i compare tokens
but tokenazing code is different for each lang
You can still abstract a lot of common logic for tokenizing
All that's different is the content of the tokens
If you don't need semantic information you might even get away with just splitting by spaces
Although that may not work for some cases
all i need is getting close to 95% plagiarism % when someone else chanegs variable naems, method names or code structure
If the project is more about testing for plagiarism than writing language tokenizers I would search for tokenizer libraries
worst case ANTLR has a repo with pre written language grammars that you could potentially use
plagiarizatio nis like most important part of the project
the rest is just front-end web app
Then I would research tokenizer libs
i will also have to compile the code in docker component so i test code for plagiarization only when i passed compilation
I would definitely look for libs
I would try and use some sort of AI service. It will have a better scope than just tokenization