Khasi to English Neural Machine Translation: an Implementation Perspective
N. Donald Jefferson Thabah1, Bipul Syam Purkayastha2
1N. Donald Jefferson Thabah*, Department of Computer Science, Assam University, Silchar, India.
2Bipul Syam Purkayastha, Department of Computer Science, Assam University, Silchar, India.
Manuscript received on November 22, 2019. | Revised Manuscript received on December 15, 2019. | Manuscript published on December 30, 2019. | PP: 4330-4336 | Volume-9 Issue-2, December, 2019. | Retrieval Number: B4528129219/2019©BEIESP | DOI: 10.35940/ijeat.B4528.129219
Open Access | Ethics and Policies | Cite | Mendeley
© The Authors. Blue Eyes Intelligence Engineering and Sciences Publication (BEIESP). This is an open access article under the CC BY-NC-ND license (http://creativecommons.org/licenses/by-nc-nd/4.0/)
Abstract: Being able to translate and communicate consistently from one language to another would have been the ultimate goal of an intelligent system. With recent advancement of Neural Machine Translation (NMT), it has shown a promising solution to the problem of machine translation. NMT generally requires large size parallel corpora to obtained a good translation accuracy. In this paper, we would like to explore a Translation system from Khasi to English language using both supervised and unsupervised technique. Unsupervised was inspired to help attaining a better translation accuracy for low resource language. It was influenced by the recent advancement of unsupervised neural machine translation which primarily relies on monolingual corpora. In this work, Supervised NMT technique was also implemented and compared with the standard OpenNMT toolkit. Here, we also use Statistical Machine Translation (SMT) tools like Moses as a standard benchmark to compare the translation accuracy. When considering monolingual corpus, we obtain an accuracy of 0.23%. Given the small size monolingual corpus the result was lacking but showed promising rooms for improvement. We obtain much better accuracy of 35.35% and 41.87% when we use parallel corpus in supervised NMT and OpenNMT respectively. On comparison with SMT system with Blue score of 43.76%, the supervised NMT system was on par in its performance. Lastly, with improvement in corpus size and better adaptation of preprocessing steps on the source language (Khasi) the result can be tune to a better outcome.
Keywords: Khasi to English NMT, machine translation, supervised NMT, unsupervised NMT .