A Hybrid Technique using CNN+LSTM for Speech Emotion Recognition
Hafsa Qazi1, Baij Nath Kaushik2

1Hafsa Qazi*, Department of Computer Science, Shri Mata Vaishno Devi University, Katra, J&K, India.
2Baij Nath Kaushik, Associate Professor, Department of Computer Science & Engineering, SMVDU, Katra, J&K, India.

Manuscript received on June 01, 2020. | Revised Manuscript received on June 08, 2020. | Manuscript published on June 30, 2020. | PP: 1126-1130 | Volume-9 Issue-5, June 2020. | Retrieval Number: E1027069520/2020©BEIESP | DOI: 10.35940/ijeat.E1027.069520
Open Access | Ethics and Policies | Cite | Mendeley
© The Authors. Blue Eyes Intelligence Engineering and Sciences Publication (BEIESP). This is an open access article under the CC BY-NC-ND license (http://creativecommons.org/licenses/by-nc-nd/4.0/)

Abstract: Automatic speech emotion recognition is a very necessary activity for effective human-computer interaction. This paper is motivated by using spectrograms as inputs to the hybrid deep convolutional LSTM for speech emotion recognition. In this study, we trained our proposed model using four convolutional layers for high-level feature extraction from input spectrograms, LSTM layer for accumulating long-term dependencies and finally two dense layers. Experimental results on the SAVEE database shows promising performance. Our proposed model is highly capable as it obtained an accuracy of 94.26%.
Keywords: CNN, LSTM, RNN, SER, Spectrograms.