SVM and Cross-Validation using R Studio
Nainika Kaushik1, Manjot Kaur Bhatia2, Sonali Rastogi3

1Nainika Kaushik*, Assistant Professor, Department of Information Technology, Jagan Institute of Management Studies – JIMS Rohini, Delhi, India.
2Dr. Manjot Bhatia, Professor, Jagan Institute of Management Studies – JIMS Rohini, Delhi, India.
3Sonali Rastogi, Computer Science Engineering Graduate, Jagannath University Bahadurgarh, Haryana, India.

Manuscript received on September 02, 2020. | Revised Manuscript received on September 15, 2020. | Manuscript published on October 30, 2020. | PP: 46-54 | Volume-10 Issue-1, October 2020. | Retrieval Number: 100.1/ijeat.A16731010120 | DOI: 10.35940/ijeat.A1673.1010120
Open Access | Ethics and Policies | Cite | Mendeley
© The Authors. Blue Eyes Intelligence Engineering and Sciences Publication (BEIESP). This is an open access article under the CC BY-NC-ND license (http://creativecommons.org/licenses/by-nc-nd/4.0/)

Abstract: Each passing day data is getting multiplied. It is difficult to extract useful information from such big data. Data Mining is used to extract useful information. Data mining is used in majorly all fields like healthcare, marketing, social media platforms and so on. In this paper, data is loaded and preprocessed by dealing with some missing values. The dataset used is of Airbnb, the platform used for lodging and tourism industry. Analyzing the data by plotting correlation using spearman method. Further, applying PCA and Support Vector Machine classification technique on the dataset. There are various applications of SVM, it is used in face-detection, text and hypertext categorization, classification of images, bioinformatics and so on. SVM has high dimensional input space, sparse document vectors and regularization parameters therefore it is appropriate to use SVM. Cross-validation gives more accurate result. The dataset is divided into folds. The end product is the test set which is similar to full dataset. Confusion matrix is evaluated, grid approach is followed for building the matrix at various seeds and kernels (RBF, Polynomial). The aim of this research is to see which is the best kernel for the dataset. 
Keywords: Big Data, Data mining, Machine learning Rattle, RStudio, Support Vector Machine.