GANs and VAEs As Methods of Synthetic Data Generation and Augmentation to Enhance Heart Disease Prediction
Rohit Sahoo1, Vedang Naik2, Saurabh Singh3, Shaveta Malik4

1Rohit Sahoo*, Department of Computer Engineering, Terna Engineering College, Navi-Mumbai, India. 
2Vedang Naik, Department of Computer Engineering, Terna Engineering College, Navi-Mumbai, India. 
3Saurabh Singh, Department of Computer Engineering, Terna Engineering College, Navi-Mumbai, India. 
4Dr. Shaveta Malik, Associate Professor, Department of Computer Engineering, Terna Engineering College, Navi-Mumbai, India.
Manuscript received on November 15, 2021. Revised Manuscript received on November 17, 2021. Manuscript published on December 30, 2021. | PP: 17-23 | Volume-11 Issue-2, December 2021. | Retrieval Number: 100.1/ijeat.B32631211221 | DOI: 10.35940/ijeat.B3263.1211221
Open Access | Ethics and Policies | Cite | Mendeley
© The Authors. Blue Eyes Intelligence Engineering and Sciences Publication (BEIESP). This is an open access article under the CC BY-NC-ND license (http://creativecommons.org/licenses/by-nc-nd/4.0/)

Abstract: Heart disease instances are rising at an alarming rate, and it is critical and essential to predict any such ailments in advance. This is a challenging diagnostic that must be done accurately and swiftly. Lack of relevant data is often the impeding factor when it comes to various areas of research. Data augmentation is a strategy for improving the training of discriminative models that may be accomplished in a variety of ways. Deep generative models, which have recently advanced, now provide new approaches to enrich current data sets. Generative Models like Generative Adversarial Networks (GANs) and Variational Autoencoders (VAEs) are frequently used to generate high quality, realistic, synthetic data essential for machine learning algorithms as they play a critical role in various classification problems. In our case, we were provided with 304 rows of heart disease data to create a robust model for predicting the presence of an ailment in the patient. However, the identification of heart disease would not be efficient given the small amount of available training data. We used GAN, CGAN, and VAE to generate data to tackle this problem, thus augmenting the original data. This additional data will help in increasing the accuracy of the models created using the new dataset. We applied classification-based Machine Learning models such as Logistic Regression, Decision Trees, KNN, and Random Forest. We compared the accuracy of the said models, each of which was supplied with the original dataset and the augmented datasets that used the data generation techniques mentioned above. Our research suggests that using data generation techniques significantly boosts the accuracy of the machine learning techniques applied to them.
Keywords: Generative Adversarial Networks (GANs), Variational Autoencoders (VAEs), Conditional Generative Adversarial Networks (CGANs), Synthetic Data Generation.
Scope of the Article: Regression and Prediction.