Spam Detection in Social Media Networking Sites using Ensemble Methodology with Cross Validation
K Subba Reddy1, E. Srinivasa Reddy2
1K Subba Reddy, Research Scholar, Anucet, ANU, Guntur, AP, India.
2Dr. E. Srinivasa Reddy, Dean, Anucet, ANU, Guntur, AP, India
Manuscript received on May 06, 2020. | Revised Manuscript received on May 15, 2020. | Manuscript published on June 30, 2020. | PP: 1942-1948 | Volume-9 Issue-5, June 2020. | Retrieval Number: C5558029320/2020©BEIESP | DOI: 10.35940/ijeat.C5558.029320
Open Access | Ethics and Policies | Cite | Mendeley
© The Authors. Blue Eyes Intelligence Engineering and Sciences Publication (BEIESP). This is an open access article under the CC BY-NC-ND license (http://creativecommons.org/licenses/by-nc-nd/4.0/)
Abstract: Social media networking sites are more popular over Internet. The Internet users spend more amount of time on social media sites like Twitter, Facebook, Instagram and LinkedIn etc. The social media networking users share their ideas, opinions, information and make new friends. Social networking sites provide large amount of valuable information to the users. This large amount of information in social media attracts spammers to misuse information. These spammers create fake accounts and spread irrelevant information to the genuine users. The spam message information may be advertisements, malicious links to disturb the natural users. This spam data in social media is a very serious problem. Spam detection in social media networking sites is critical process. To extract spam messages in social media various spam detection methodologies are developed by researchers. In this paper we proposed an ensemble methodology for identification spam on Twitter social media network. In this methodology we used Decision tree induction algorithm, Naïve bayes algorithm and KNN algorithm to construct a model. As part of this approach, we compare the classification results of any two classification algorithms, if both classifiers predict the same result, then we finalize the class of tweet under investigation. If the predicted classes of both classification algorithms differ, then we use the prediction of third algorithm as the final class label of tweet. To measure the performance of our model we used precision, recall and F measure.
Keywords: Social media, Twitter, Naïve bayes, Decision Tree, KNN algorithm