Bimodal Emotion Recognition using Machine Learning
Manisha S1, Nafisa Saida H2, Nandita Gopal3, Roshni P Anand4

1Manisha S*, Department of CSE, Sri Sivasubramaniya Nadar College of Engineering, Chennai, India.
2Nafisa H Saida, Department of CSE, Sri Sivasubramaniya Nadar College of Engineering, Chennai, India.
3Nandita Gopal, Department of CSE, Sri Sivasubramaniya Nadar College of Engineering, Chennai, India.
4Roshni P Anand, Department of CSE, Sri Sivasubramaniya Nadar College of Engineering, Chennai, India.

Manuscript received on April 12, 2021. | Revised Manuscript received on April 19, 2021. | Manuscript published on April 30, 2021. | PP: 189-194 | Volume-10 Issue-4, April 2021. | Retrieval Number: 100.1/ijeat.D24510410421 | DOI: 10.35940/ijeat.D2451.0410421
Open Access | Ethics and Policies | Cite | Mendeley
© The Authors. Blue Eyes Intelligence Engineering and Sciences Publication (BEIESP). This is an open access article under the CC BY-NC-ND license (http://creativecommons.org/licenses/by-nc-nd/4.0/)

Abstract: The predominant communication channel to convey relevant and high impact information is the emotions that is embedded on our communications. Researchers have tried to exploit these emotions in recent years for human robot interactions (HRI) and human computer interactions (HCI). Emotion recognition through speech or through facial expression is termed as single mode emotion recognition. The rate of accuracy of these single mode emotion recognitions are improved using the proposed bimodal method by combining the modalities of speech and facing and recognition of emotions using a Convolutional Neural Network (CNN) model. In this paper, the proposed bimodal emotion recognition system, contains three major parts such as processing of audio, processing of video and fusion of data for detecting the emotion of a person. The fusion of visual information and audio data obtained from two different channels enhances the emotion recognition rate by providing the complementary data. The proposed method aims to classify 7 basic emotions (anger, disgust, fear, happy, neutral, sad, surprise) from an input video. We take audio and image frame from the video input to predict the final emotion of a person. The dataset used is an audio-visual dataset uniquely suited for the study of multi-modal emotion expression and perception. Dataset used here is RAVDESS dataset which contains audio-visual dataset, visual dataset and audio dataset. For bimodal emotion detection the audio-visual dataset is used. 
Keywords: Emotion recognition, Bimodal analysis, machine learning, Ensemble learning, k-cross validation, MFCC.