Development of Cross Language Clone Detector for C, C++ & Java Repositories using Natural Language Processing
Sanjay B. Ankali1, Latha Parthiban2

1Sanjay Ankali*, department of Computer Science & Engineering, KLECET Chikodi, India.
2Dr. Latha Parthiban, department of Computer Science, Pondicherry University,India.
Manuscript received on November 22, 2019. | Revised Manuscript received on December 08, 2019. | Manuscript published on December 30, 2019. | PP: 2289-2293 | Volume-9 Issue-2, December, 2019. | Retrieval Number:  B3612129219/2019©BEIESP | DOI: 10.35940/ijeat.B3612.129219
Open Access | Ethics and Policies | Cite | Mendeley
© The Authors. Blue Eyes Intelligence Engineering and Sciences Publication (BEIESP). This is an open access article under the CC BY-NC-ND license (

Abstract: Reusing the code with or without modification is common process in building all the large codebases of system software like Linux, gcc , and jdk. This process is referred to as software cloning or forking. Developers always find difficulty of bug fixes in porting large code base from one language to other native language during software porting. There exist many approaches in identifying software clones of same language that may not contribute for the developers involved in porting hence there is a need for cross language clone detector. This paper uses primary Natural Language Processing (NLP) approach using latent semantic analysis to find the cross language clones of other neighboring languages in terms of all 4 types of clones using latent semantic analysis algorithm that uses Singular value decomposition. It takes input as code(C, C++ or Java) and matches all the neighboring code clones in the static repository in terms of frequency of lines matched.
Keywords: Cross language Clones, Porting, Natural Language Processing