A Simple Tamil Speech Recognition System Based on Cmusphinx
Arvind Madaboosi Mukund1, Priyanka Balaji Ramanathan2, T. Sujithra3

1Arvind Madaboosi Mukund, SRM Institute of Science and Technology, Chennai (Tamil Nadu), India.
2Priyanka Balaji Ramanathan, SRM Institute of Science and Technology, Chennai (Tamil Nadu), India.
3Dr. T. Sujithra, SRM Institute of Science and Technology, Chennai (Tamil Nadu), India.

Manuscript received on 18 April 2019 | Revised Manuscript received on 25 April 2019 | Manuscript published on 30 April 2019 | PP: 655-658 | Volume-8 Issue-4, April 2019 | Retrieval Number: D6733048419/19©BEIESP
Open Access | Ethics and Policies | Cite | Mendeley | Indexing and Abstracting
© The Authors. Blue Eyes Intelligence Engineering and Sciences Publication (BEIESP). This is an open access article under the CC-BY-NC-ND license (http://creativecommons.org/licenses/by-nc-nd/4.0/)

Abstract: This paper focuses on the research of phonemes and their usage for Tamil speech recognition using the CMUSphinx API [4]. Tamil did not have a solid speech recognition application especially the Tamil that is spoken on a daily basis. This paper is the outcome of the authors building a new dictionary for 418 words (as of April 27 2019). This paper is intended solely to show the result of mapping Tamil grapheme to phoneme in hope that it will help similar sounding languages and the development of the Tamil language itself. The model used recorded speech to map the graphemes in the built dictionary to the phonemes in the recorded words. The accuracy of the model is represented by determining the Word Error Rate generated by the language model.
Keywords: Language Models, Word Error Rate, Speech Recognition, Phonemes, Graphemes.

Scope of the Article: Pattern Recognition