Text Classification of Cornell Movie Data using Data Mining with Feature Selection
A. K. Shrivas1, S. M. Ghosh2, Amit Kumar Dewangan3

1A. K. Shrivas*, Department of Information Technology, Dr. C.V. Raman University, Bilaspur (C.G.), India.
2S. M. Ghosh, Department of Computer Science Engineering, Dr. C.V. Raman University, Bilaspur (C.G.), India.
3Amit Kumar Dewangan, Department of Computer Science Engineering, Dr. C.V. Raman University, Bilaspur (C.G.), India.
Manuscript received on November 22, 2019. | Revised Manuscript received on December 15, 2019. | Manuscript published on December 30, 2019. | PP: 2950-2955 | Volume-9 Issue-2, December, 2019. | Retrieval Number:  B2329129219/2019©BEIESP | DOI: 10.35940/ijeat.B2329.129219
Open Access | Ethics and Policies | Cite | Mendeley
© The Authors. Blue Eyes Intelligence Engineering and Sciences Publication (BEIESP). This is an open access article under the CC BY-NC-ND license (http://creativecommons.org/licenses/by-nc-nd/4.0/)

Abstract: Text Classification is branch of text mining through which we can analyze the sentiment of the movie data. In this research paper we have applied different preprocessing techniques to reduce the features from cornell movie data set. We have also applied the Correlation-based feature subset selection and chi-square feature selection technique for gathering most valuable words of each category in text mining processes. The new cornell movie data set formed after applying the preprocessing steps and feature selection techniques. We have classified the cornell movie data as positive or negative using various classifiers like Support Vector Machine (SVM), Multilayer Perceptron (MLP), Naive Bayes (NB), Bays Net (BN) and Random Forest (RF) classifier. We have also compared the classification accuracy among classifiers and achieved better accuracy i. e. 87% in case of SVM classifier with reduced number of features. The suggested classifier can be useful in opinion of movie review, analysis of any blog and documents etc.
Keywords: Classification, Cornell Movie Dataset, Feature Selection, and WEKA Tool