Plagiarism Detection (within an Introduction to Computer Programming Course)
It is relatively simple to detect code copying, where two students have shared the same or similar code. Detecting other forms of plagiarism is much more complicated.
Detect code copying
There are a variety of code similarity analysis tools. We recommend using compare50, an open source project that runs locally and is similar in functionality to the more-famous closed-source frequently malfunctioning moss system.
Using Compare50
To analyze code similarity of any assignment:
- Assuming Python is already installed, install
compare50
by running the following command on the command line:pip install -U compare50
. See their documentation for further installation details. - Place all student submissions for an assignment into a single parent directory (note that this is how GitHub Classroom Assistant, which we recommend using, already organizes student submissions).
- Place any given code into a sub-directory named
given-code
within this same parent directory. - Open the parent directory in Visual Studio Code and search and replace any comments or docstrings with empty strings. The following regular expressions searches will match the comments and docstrings, respectively:
#(.*)\n
and\s?"""[\w\W]*?\s+?"""\s?
- Open a command line tool (i.e.
Terminal
on Mac or Git Bash or Windows Subsystem for Linux on Windows) and navigate into the parent directory. - Analyze the results by running
python -m compare50 **/*.py -d given-code
, where*.py
can be changed to any filename you want to compare across submissions, such aspython -m compare50 **/solution.py -d given-code
compare50
will generate aresults
directory containing results as HTML web pages. Open theindex.html
page in a web browser to view results.- Click on any results to see the details of the analysis. Report any that look unexpectedly similar.
Jupyter Notebooks
For any assignments requiring students to write code in Jupyter Notebooks, it is easiest to first extract the Python code out of the notebook, e.g.
jupyter nbconvert --no-prompt --to script **/*.ipynb
This will output net files ending in .py
for each notebook. Run these through compare50
as described above.