Classifying Books by Genre Based on Cover
Rajasree Jayaram1, Harshitha Mallappa2, Pavithra S3, Munshira Noor B4, Bhanushree K J5

1Rajasree Jayaram*, Department of Computer Science and Engineering, Bangalore Institute of Technology, Bengaluru, India.
2Harshitha Mallappa, Department of Computer Science and Engineering, Bangalore Institute of Technology, Bengaluru, India.
3Pavithra S, Department of Computer Science and Engineering, Bangalore Institute of Technology, Bengaluru, India.
4Munshira Noor B, Department of Computer Science and Engineering, Bangalore Institute of Technology, Bengaluru, India.
5Bhanushree K J, Department of Computer Science and Engineering, Bangalore Institute of Technology, Bengaluru, India.

Manuscript received on May 25, 2020. | Revised Manuscript received on June 05, 2020. | Manuscript published on June 30, 2020. | PP: 530-535 | Volume-9 Issue-5, June 2020. | Retrieval Number: E9561069520/2020©BEIESP | DOI: 10.35940/ijeat.E9561.069520
Open Access | Ethics and Policies | Cite | Mendeley
© The Authors. Blue Eyes Intelligence Engineering and Sciences Publication (BEIESP). This is an open access article under the CC BY-NC-ND license (http://creativecommons.org/licenses/by-nc-nd/4.0/)

Abstract: A book cover can convey a lot about the content of the book. Despite the adage to not evaluate something based on outward appearances, we apply machine learning to see if we can, in fact, judge a book by its cover, or more specifically by its cover art and text. The classification was done considering three different aspects – cover image only, cover text only and both image and text in a multimodal approach. Image classification was done using transfer learning with Inception-v3. For text detection from the cover image, images were first converted to greyscale and different thresholds were applied to detect maximum text. This text was then vectorized and used to train a Multinomial Naïve Bayes model. We also trained custom CNNs for image and text modalities. For multimodal classification, we examine late fusion model, where the modalities are combined at decision level, and early fusion model, where the modalities are combined at the feature level. Our results show that the late fusion model performs best in our setting. We also observe that text is more informative with respect to genre prediction and that significant efforts need to be devoted to solve this image-based classification task to a satisfactory level. This research can be used to aid product design process by revealing underlying information. It could also be used in recommender systems and to help in promotion and sales processes for automatic genre suggestion. 
Keywords: Text classification, Image classification, Multimodal classification, Deep Learning, Genre Prediction.