A Novel Integrated Type 2 Diabetes Prediction Model for Indian Population using Data Mining Techniques
Omprakash S. Chandrakar1, Jatinder kumar R. Saini2

1Omprakash S. Chandrakar, Associate Professor, Uka Tarsadia University, Bardoli, Gujarat, India.
2Jatinderkumar R. Saini*, Professor,  Symbiosis International Deemed University, Pune, Maharashtra, India.
Manuscript received on November 23, 2019. | Revised Manuscript received on December 21, 2019. | Manuscript published on December 30, 2019. | PP: 3067-3072 | Volume-9 Issue-2, December, 2019. | Retrieval Number:  B4220129219/2019©BEIESP | DOI: 10.35940/ijeat.B4220.129219
Open Access | Ethics and Policies | Cite | Mendeley
© The Authors. Blue Eyes Intelligence Engineering and Sciences Publication (BEIESP). This is an open access article under the CC BY-NC-ND license (http://creativecommons.org/licenses/by-nc-nd/4.0/)

Abstract: Late diagnosis and undiagnosed type 2 diabetes are the two major concerns for India, which is going to be a diabetes capital shortly. Several diabetes risk score (DRS) tools have been proposed and deployed for detecting the persons with high risk. These DRS tools have been developed using the multiple logistic regression model. But this model is both imperfect and subject to misuse. Another major issue with the DRS tools developed for Indian population is that they are based on the very limited urban population that does not represent the population of India. The objective of current research work is to develop a classification model for type 2 diabetes prediction. Along with this, the building of a novel integrated model for type 2 diabetes risk prediction is discussed consisting of the aggregate classification model and Indian weighted diabetes risk score model. The dataset used to develop and validate the model is obtained from the Annual Health Survey comprising of nearly 0.7 million and nearly 75 thousand adult participants respectively from around 400 districts of India. The proposed integrated diabetes risk prediction model predicts diabetes with 69.89% sensitivity, 56.58% specificity. The positive predictive value of the proposed integrated model is 15.88%, which is a significant improvement as the prevalence of diabetes is only 3.68% for the study population. Developing countries such as India, where undiagnosed diabetes and limited financial resources are a significant concern, the proposed integrated model for diabetes risk prediction can be useful as a cheaper tool useful for mass-screening, which can save up to 30% of the total screening cost.
Keywords: Indian Weighed Diabetes Risk Score; Aggregate Classification Model; Feature Selection; Semantic Discretization, Diabetes Mass Screening Test.