Predicting & Visulaizing the Clusters Assignments in Health Care Dataset for Disease Prediction
Neeraj Bhargava1, Ritu Bhargava2, Abhishek Kumar3, Shikha Bhardwaj4

1Prof Neeraj Bhargava*, Department of Computer Science, MDS University, Ajmer, India.
2Dr. Ritu Bhargava, Lecturer, Computer Science, Sofia Girls’ college, Ajmer, India.
3Abhishek Kumar, Chitkara University Institute of Engineering and Technology, Chitkara University ,Punjab, India.
4Shikha Bhardwaj, Research Scholar, Mjrp ,Jaipur, India.
Manuscript received on November 23, 2019. | Revised Manuscript received on December 15, 2019. | Manuscript published on December 30, 2019. | PP: 3124-3129 | Volume-9 Issue-2, December, 2019. | Retrieval Number:  B3977129219/2019©BEIESP | DOI: 10.35940/ijeat.B3977.129219
Open Access | Ethics and Policies | Cite | Mendeley
© The Authors. Blue Eyes Intelligence Engineering and Sciences Publication (BEIESP). This is an open access article under the CC BY-NC-ND license (

Abstract: DM is the process which is used for the analyzing hidden patterns of data. This analyzing completed according to the several perspectives for categorization into usable information. Here, DM is referred as the Data Mining It is composed and assembled in same regions, like data warehouses, for effective analysis, DM algorithms. In paper we will use these records and will find the major attribute which plays an important role in disease prediction. To do so, first we implemented Naive bayes’ algorithm where every pair of features being classified is independent of each other. Once we get the Naive Bayes’ Result then we apply the Clustering technique on the same dataset. Simple K-Means Clustering is used to get the clusters of the data results. We can visualize the Cluster assignments for each attribute against the Resultant or prediction attribute. We can have the better understanding through these visualizations about the dependencies of attributes on the prediction variable. K-means algorithm is an iterative algorithm that tries to partition the dataset into K predefined distinct nonoverlapping subgroups (clusters) where each data point belongs to only one group. And after final analysis of the result of both techniques we found two attributes which are having maximum weight as compare to others. These two attributes Glucose and Insulin must consider in the diabetes prediction.
Keywords: Weka, Data Mining, DM, Decision tree, SVM.