Hepatitis Patient Classification using Random Forest Algorithms with Cost-Sensitive Method
Arifin Nugroho1, Ricky Risnantoyo2, Saifurrachman Chohan3, Nuraeni Herlinawati4, Sfenrianto5
1Arifin Nugroho*, Department of Computer Science – Postgraduate Programs STMIK Nusa Mandiri, Jakarta, Indonesia.
2Ricky Risnantoyo, Department of Computer Science – Postgraduate Programs STMIK Nusa Mandiri, Jakarta, Indonesia.
3Saifurrachman Chohan, Department of Computer Science – Postgraduate Programs STMIK Nusa Mandiri, Jakarta, Indonesia.
4Nuraeni Herlinawati, Department of Computer Science – Postgraduate Programs STMIK Nusa Mandiri, Jakarta, Indonesia.
5Sfenrianto, Department of Information Systems Management, BINUS Graduate Program – Master of Information Systems Management, Bina Nusantara University, Jakarta. Indonesia.
Manuscript received on January 26, 2020. | Revised Manuscript received on February 05, 2020. | Manuscript published on February 30, 2020. | PP: 2528-2532 | Volume-9 Issue-3, February 2020. | Retrieval Number: C5903029320 /2020©BEIESP | DOI: 10.35940/ijeat.C5903.029320
Open Access | Ethics and Policies | Cite | Mendeley
© The Authors. Blue Eyes Intelligence Engineering and Sciences Publication (BEIESP). This is an open access article under the CC BY-NC-ND license (http://creativecommons.org/licenses/by-nc-nd/4.0/)
Abstract: Hepatitis is a common worldwide public health problem that attacks almost every population in various countries. Machine learning has been widely used to classify various diseases, including hepatitis. In this research, the Random Forest algorithm will be used along with the dataset of patients with hepatitis to classify whether the patient’s condition will live or die. Missing value and imbalance class exists in this dataset. In that class, the sample of healthy and sick patients that often occurs in the disease dataset. We replace missing values using mean and median and to deal with this imbalance of class, we use cost-sensitive methods to put penalty in classification. A manual selection feature process is also carried out to look for features that can be removed while still maintaining the quality of accuracy and classification. The validation method used is 10-fold Cross-Validation and using Random Forest Algorithm with tuned parameter to find the best result in classifying the class. This research prioritizes classification results by considering the small amount of data and the imbalance of the class, so it can classify the class more successfully and accurate for hepatitis patients. The accuracy value obtained is 85.80%.
Keywords: Machine learning, random forest, hepatitis, imbalance