Detecting and Classifying Toxic Language in Twitter using Machine Learning
Nischal Lakhotia1, Omprakash Harod2, T. Manoranjitham3
1Nischal Lakhotia*, Department of Computer Science and Engineering, SRMIST, Kattankulathur, India.
2Omprakash Harod, Department of Computer Science and Engineering, SRMIST, Kattankulathur, India.
3T. Manoranjitham, Department of Computer Science and Engineering, SRMIST, Kattankulathur, India.
Manuscript received on May 29, 2020. | Revised Manuscript received on June 22, 2020. | Manuscript published on June 30, 2020. | PP: 566-571 | Volume-9 Issue-5, June 2020. | Retrieval Number: E9714069520/2020©BEIESP | DOI: 10.35940/ijeat.E9714.069520
Open Access | Ethics and Policies | Cite | Mendeley
© The Authors. Blue Eyes Intelligence Engineering and Sciences Publication (BEIESP). This is an open access article under the CC BY-NC-ND license (http://creativecommons.org/licenses/by-nc-nd/4.0/)
Abstract: Today international on-line content material has turned out to be a first-rate part due to growth in the use of net. Individuals of various societies and instructive foundation can speak through this platform. Therefore, for automatic detection of poisonous content, we need to distinguish between hate speech and offensive language. Here a way to robotically stumble on and classify tweets on Twitter into 3 commands: hateful, offensive and easy is proposed. We do not forget n-grams as functions and by way of passing their time period frequency-inverse document frequency (TFIDF) values to numerous system gaining knowledge of fashions using Twitter dataset, we perform comparative evaluation of the models. We work towards classification and comparison of different classifiers using the combination of best feature from each type of feature extraction and determining which model works best for the purpose of classification of tweets into hate-speech, offensive language or neither.
Keywords: Toxic Language, hate speech, offensive language, n-gram, tf-idf, machine learning