Detection of Phishing Websites using an Efficient Feature-Based Machine Learning Framework
V. V. Ramalingam1, Paras Yadav2, Prakhar Srivastava3

1V. V. Ramalingam, Department of Computer Applications from Bharadhidasan University (2000), M.Phil Degree in Computer Science from Periyar University (2007).
2Paras Yadav, Department of Computer Science and Engineering, B.Tech student in SRM Institute of Science and Technology.
3Prakhar Srivastava, Department of Computer Science and Engineering, B.Tech student in SRM Institute of Science and Technology.
Manuscript received on January 26, 2020. | Revised Manuscript received on February 05, 2020. | Manuscript published on February 30, 2020. | PP: 2857-2862 | Volume-9 Issue-3, February 2020. | Retrieval Number:  C5909029320/2020©BEIESP | DOI: 10.35940/ijeat.C5909.029320
Open Access | Ethics and Policies | Cite | Mendeley
© The Authors. Blue Eyes Intelligence Engineering and Sciences Publication (BEIESP). This is an open access article under the CC BY-NC-ND license (http://creativecommons.org/licenses/by-nc-nd/4.0/)

Abstract: Phishing is a cyber-attack which is socially engineered to trick naive online users into revealing sensitive information such as user data, login credentials, social security number, banking information etc. Attackers fool the Internet users by posing as a legitimate webpage to retrieve personal information. This can also be done by sending emails posing as reputable companies or businesses. Phishing exploits several vulnerabilities effectively and there is no one solution which protects users from all vulnerabilities. A classification/prediction model is designed based on heuristic features that are extracted from website domain, URL, web protocol, source code to eliminate the drawbacks of existing anti-phishing techniques. In the model we combine some existing solutions such as blacklisting and whitelisting, heuristics and visual-based similarity which provides higher level security. We use the model with different Machine Learning Algorithms, namely Logistic Regression, Decision Trees, K-Nearest Neighbours and Random Forests, and compare the results to find the most efficient machine learning framework.
Keywords: Machine Learning, Blacklist, Whitelist, Cyberattacks, Logistic Regression, K-Nearest Neighbour