knowledge-kitchen / course-notes / education-automation

Plagiarism Detection (within an Introduction to Computer Programming Course)

It is relatively simple to detect code copying, where two students have shared the same or similar code. Detecting other forms of plagiarism is much more complicated.

Detect code copying

There are a variety of code similarity analysis tools. We recommend using compare50, an open source project that runs locally and is similar in functionality to the more-famous closed-source frequently malfunctioning moss system.

Using Compare50

To analyze code similarity of any assignment:

  1. Assuming Python is already installed, install compare50 by running the following command on the command line: pip install -U compare50. See their documentation for further installation details.
  2. Place all student submissions for an assignment into a single parent directory (note that this is how GitHub Classroom Assistant, which we recommend using, already organizes student submissions).
  3. Place any given code into a sub-directory named given-code within this same parent directory.
  4. Open the parent directory in Visual Studio Code and search and replace any comments or docstrings with empty strings. The following regular expressions searches will match the comments and docstrings, respectively: #(.*)\n and \s?"""[\w\W]*?\s+?"""\s?
  5. Open a command line tool (i.e. Terminal on Mac or Git Bash or Windows Subsystem for Linux on Windows) and navigate into the parent directory.
  6. Analyze the results by running python -m compare50 **/*.py -d given-code, where *.py can be changed to any filename you want to compare across submissions, such as python -m compare50 **/ -d given-code
  7. compare50 will generate a results directory containing results as HTML web pages. Open the index.html page in a web browser to view results.
  8. Click on any results to see the details of the analysis. Report any that look unexpectedly similar.

Jupyter Notebooks

For any assignments requiring students to write code in Jupyter Notebooks, it is easiest to first extract the Python code out of the notebook, e.g.

jupyter nbconvert --no-prompt --to script **/*.ipynb

This will output net files ending in .py for each notebook. Run these through compare50 as described above.