Plagiarism Detection in Source Code Using Machine Learning
S. Priya1, Anukul Dixit2, Krishanu Das3, Ronak Harish Patil4

1S. Priya, Department of Computer Science & Engineering, SRM Institute of Science & Technology, Chennai (Tamil Nadu), India.
2Anukul Dixit, Department of Computer Science & Engineering, SRM Institute of Science & Technology, Chennai (Tamil Nadu), India.
3Krishanu Das, Department of Computer Science & Engineering, SRM Institute of Science & Technology, Chennai (Tamil Nadu), India.
4Ronak Harish Patil, Department of Computer Science & Engineering, SRM Institute of Science & Technology, Chennai (Tamil Nadu), India.

Manuscript received on 18 April 2019 | Revised Manuscript received on 25 April 2019 | Manuscript published on 30 April 2019 | PP: 897-900 | Volume-8 Issue-4, April 2019 | Retrieval Number: D6359048419/19©BEIESP
Open Access | Ethics and Policies | Cite | Mendeley | Indexing and Abstracting
© The Authors. Blue Eyes Intelligence Engineering and Sciences Publication (BEIESP). This is an open access article under the CC-BY-NC-ND license (http://creativecommons.org/licenses/by-nc-nd/4.0/)

Abstract: The Source Code Plagiarism has become a major problem in today’s Educational World. The boost in the technology has led to the development of major IT sector and software industry. Thus, intellectual property of a being is not less important than any other valuable property. If the plagiarism reaches above the phase 6 it becomes almost impossible to detect it using the tools which are designed for the structural analysis. Therefore, we designed the new model for source code plagiarism detection, which uses the concepts of Machine Learning in order to fight with the higher phases of plagiarism. Conventional methods like structural methods, attribute counting method and graph-based analysis don’t produce the results with accuracy. Machine learning algorithms produce the most accurate result with continuous learning from the training modules. The three algorithms used are Naïve Bayes Algorithm, k-Nearest Neighbor and ADA Boost Meta Learning Algorithm. Since, no single algorithm can produce the result with accuracy, thus combining the algorithms help to produce results more accurately.
Keywords: Machine Learning, K-Nearest Neighbor, Naïve Bayes Classifier, Source Code., Plagiarism Detection.

Scope of the Article: Machine Learning