Data Mining Techniques for Analysing Employment Data
Anatoli Nachev

Dr. Anatoli Nachev, Lecturer, NUI, Galway, Ireland.
Manuscript received on November 25, 2019. | Revised Manuscript received on December 08, 2019. | Manuscript published on December 30, 2019. | PP: 555-566 | Volume-9 Issue-2, December, 2019. | Retrieval Number: B3311129219/2019©BEIESP | DOI: 10.35940/ijeat.B3311.129219
Open Access | Ethics and Policies | Cite | Mendeley
© The Authors. Blue Eyes Intelligence Engineering and Sciences Publication (BEIESP). This is an open access article under the CC BY-NC-ND license (http://creativecommons.org/licenses/by-nc-nd/4.0/)

Abstract: This paper proposes a methodology that uses a large-scale employment dataset in order to explore which factors affect employment and how. The proposed methodology is a combination of predictive modelling, variable significance analysis, and VEC analysis. Modelling is based on logistic regression, linear discriminant analysis, neural network, classification tree, and support vector machine. Following the CRISP-DM standard process model, we train binary classifiers optimising their hyper-parameters and measure their performance by prediction accuracy, ROC analysis, and AUC. Using sensitivity analysis, we rank the variable significance in order to identify and measure factors of employment. Using VEC analysis, we further explore how values of those factors affect employment. Findings show that best performing models are neural networks and support vector machines with preference to the latter for quality of VEC. Experiments also suggest that education and age are primary contributors for correct classification with specific value distribution, discussed in the paper. All results were validated using a rigorous testing procedure that involves training, validation, and test data partitions and a combination of multiple runs along with three-fold cross-validation. This study addresses some gaps in previous research publications, which lack quantification of the conclusions made.
Keywords: Classification, data mining, employment data,machine learning.