A Survey and Comparative Analysis of Expectation Maximization Based Semi-Supervised Text Classification
Purvi Rekh1, Amit Thakkar2, Amit Ganatra3
1Purvi K. Rekh, U & PU Patel Department of Computer Engineering, Chandubhai S Patel Institute of Technology, Changa, Petlad, India.
2Amit R. Thakkar, Department of Information and Technology, Chandubhai S Patel Institute of Technology, Changa, Petlad, India.
3Amit Ganatra, U & PU Patel Department of Computer Engineering, Chandubhai S Patel Institute of Technology, Changa, Petlad, India.
Manuscript received on January 17, 2012. | Revised Manuscript received on February 05, 2012. | Manuscript published on February 29, 2012. | PP: 141-146 | Volume-1 Issue-3, February 2012. | Retrieval Number: C0210021312/2011©BEIESP
Open Access | Ethics and Policies | Cite
© The Authors. Blue Eyes Intelligence Engineering and Sciences Publication (BEIESP). This is an open access article under the CC BY-NC-ND license (http://creativecommons.org/licenses/by-nc-nd/4.0/)
Abstract: Semi-supervised learning (SSL) based on Naïve Bayesian (NB) and Expectation Maximization (EM) combines small limited numbers of labeled data with a large amount of unlabeled data to help train classifier and increase classification accuracy. The iterative process in the standard EM-based semi-supervised learning includes two steps: firstly, use the classifier constructed in previous iteration to classify all unlabeled samples; then, train a new classifier based on the reconstructed training set, which is composed of labeled samples and all unlabeled samples. There are limitations of standard EM-based semi-supervised learning like, problem in the process of reconstructing the training set – some unlabeled samples are misclassified by the current classifier, problem of over-training, problem of as the number of documents increases, the running time increases significantly. With the aim of improving the efficiency problem of the standard EM algorithm, many authors have proposed approaches. These approaches are described in this paper, also comparison of these approaches is done and limitations of these methods are described. Also some research challenges are given in this area.
Keywords: Expectation Maximization, Naïve Bayesian, Semi-supervised learning, Text Classification.