A New Hybrid Strategy for Malware Detection Classification with Multiple Feature Selection Methods and Ensemble Learning Methods
P. HarshaLatha1, R. Mohanasundaram2

1P. Harsha Latha*, Research Scholar, School of Computer Science and Engineering, Vellore Institute of Technology, Vellore, (Tamil Nadu), India.
2R. Mohanasundaram, Associate Professor, School of Computer Science and Engineering, Vellore Institute of Technology, Vellore, (Tamil Nadu), India.
Manuscript received on November 26, 2019. | Revised Manuscript received on December 15, 2019. | Manuscript published on December 30, 2019. | PP: 4013-4018  | Volume-9 Issue-2, December, 2019. | Retrieval Number: B4666129219/2019©BEIESP | DOI: 10.35940/ijeat.B4666.129219
Open Access | Ethics and Policies | Cite | Mendeley
© The Authors. Blue Eyes Intelligence Engineering and Sciences Publication (BEIESP). This is an open access article under the CC BY-NC-ND license (http://creativecommons.org/licenses/by-nc-nd/4.0/)

Abstract: A dramatic increase in malware in our day-to-day life causes a noteworthy problem in cyber security. The traditional approaches and signature-based models are not sufficient to defense with the new malware. To achieve zero-day attacks of malware, these approaches are not much competent to face new malware. To enhance the compete for the mechanism of classifying new malware the machine learning approaches are highly effective. To classify new malware with the high dimensionality of data leads to reduce the quality of output and low-performance results. In this paper, we propose a new hybrid strategy that combines the power of feature selection methods along with ensemble learning methods to improve accuracy for high dimensionality of data. This hybrid approach having three stages, preprocessing, feature selection and classification. Three different types of feature selection methods: Extra Trees Classifier, Percentile and K Best feature selection methods are used to select the best features (dimensionality reduction) and four ensemble classifiers: Ada Boost, Gradient Boosting, Random Forest and Bagging are used for classification. The accuracy of ensemble classifiers are increased with this hybrid model and produces better results of classification with 91.50% accuracy. For dealing with the high dimensionality of data this hybrid approach is very effective and gives better results.
Keywords: Hybrid Model, Dimensionality Reduction, Machine Learning, Feature Selection, Classification, Malware detection, Ensemble Learning.