Comparative Study of K-means and Bisecting k-means Techniques in Wordnet Based Document Clustering
B.S.Vamsi Krishna1, P.Satheesh2, Suneel Kumar R.3
1B. S. Vamsi Krishna, Sr.Assistant Professor, CSE, MVGR College of Engineering, Chintalavalasa, Vizianagaram, Andhrapradesh, India.
2P. Satheesh, Associate Professor, CSE department, MVGR College of Engineering, Chintalavalasa, Vizianagaram, Andhrapradesh, India.
3Suneel Kumar R,CSE Department, MVGR College of Engineering, Chintalavalasa Vizianagaram, Andhrapradesh, India.
Manuscript received on July 17, 2012. | Revised Manuscript received on August 25, 2012. | Manuscript published on August 30, 2012. | PP: 229-234 | Volume-1 Issue-6, August 2012. | Retrieval Number: F0673081612/2012©BEIESP
Open Access | Ethics and Policies | Cite
© The Authors. Blue Eyes Intelligence Engineering and Sciences Publication (BEIESP). This is an open access article under the CC BY-NC-ND license (http://creativecommons.org/licenses/by-nc-nd/4.0/)
Abstract: Document clustering plays major role in the fast developing information explosion. It is considered as tool for performing information based operations. Document clustering generates clusters from whole document collection automatically and used in many fields. It is the process of grouping text documents into category groups. It has found applications in various domains in information retrieval and web information systems. Ontology-based computing is considered as a natural evolution of existing technologies to cope with the information onslaught. In current paper, background knowledge derived from Word Net as Ontology is applied during preprocessing of documents for Document Clustering. Document vectors constructed from WordNet Synsets is used as input for clustering. Comparative analysis is done between clustering using k-means and clustering using bi-secting k-means. Results indicate that the bi-secting k-means clustering technique is better than standard k-means clustering technique. These results based on the analysis of specifics of clustering algorithm and nature of document data.
Keywords: Bisecting k-means, document clustering, standatd k-means, wordnet.