Text Normalization and Its Role in Speech Synthesis
Pooja Manisha Rahate1, M. B. Chandak2
1Pooja Manisha Rahate, Department of Computer Science & Engineering, Shri Ramdeobaba College of Engineering & Management, Nagpur (Maharashtra), India.
2Manoj Chandak, HOD, Department of Computer Science & Engineering, Shri Ramdeobaba College of Engineering & Management, Nagpur (Maharashtra), India.
Manuscript received on 25 August 2019 | Revised Manuscript received on 01 September 2019 | Manuscript Published on 14 September 2019 | PP: 115-122 | Volume-8 Issue-5S3, July 2019 | Retrieval Number: E10290785S319/19©BEIESP | DOI: 10.35940/ijeat.E1029.0785S319
Open Access | Editorial and Publishing Policies | Cite | Mendeley | Indexing and Abstracting
© The Authors. Blue Eyes Intelligence Engineering and Sciences Publication (BEIESP). This is an open access article under the CC-BY-NC-ND license (http://creativecommons.org/licenses/by-nc-nd/4.0/)
Abstract: As the technology is developing day-by-day and most of the human work is done by the machine or systems, it is the need of the today’s world to develop systems that can read informal text or words in a proper and standard way even though the format of writing these words or text does not match the standard English words. The informal texts types that exists are the dates, currencies, abbreviations and acronyms of standard words, measurements, URLs, phone numbers etc. This paper focuses on the normalization of such text that converts the informal text into their equivalent standard form which is called text normalization. To produce the equivalent speech form of these non-standard words is the necessity of the today’s system. Text normalization is pre-processing step of the natural language processing system. The paper discusses various techniques and methods for the conversion of the non-standard words into standard words. The methods used for classification of the token are regular expressions, used for simple patter match of the token. Naïve Bayes classification for number sense disambiguity and Stochastic Gradient Descent for resolving acronym and class ambiguity .The result and analysis are also mentioned in the form of error-rate of the system, which shows the area for the scope of more improvement in the system.
Keywords: Naive Bayes, Stochastic Gradient Descent, Text Normalization, Translation Memory.
Scope of the Article: Soft computing Signal and Speech Processing