Review of Clustering Algorithm for Categorical Data
Poonam M. Bhagat1, Prasad S. Halgaonkar2, Vijay M. Wadhai3
1Poonam M. Bhagat,  Department of Computer Engg. MITCOE, Pune University, India.
2Prasad S. Halgaonkar,  Department of Computer Engg. MITCOE, Pune University, India.
3Vijay M. Wadhai,  Principal MITCOE, Pune University, India.
Manuscript received on November 27, 2013. | Revised Manuscript received on December 13, 2013. | Manuscript published on December 30, 2013. | PP: 341-345 | Volume-3, Issue-2, December 2013. | Retrieval Number:  B2484123213/2013©BEIESP

Open Access | Ethics and Policies | Cite
© The Authors. Blue Eyes Intelligence Engineering and Sciences Publication (BEIESP). This is an open access article under the CC BY-NC-ND license (http://creativecommons.org/licenses/by-nc-nd/4.0/)

Abstract: Clustering is a partition of data into a group of similar or dissimilar data points and each group is a set of data points called clusters. Clustering is an unsupervised learning with no predefined class label for the data points. Clustering is considered an important tool for data mining. Clustering has many applications such as pattern recognition, image processing, market analysis, World Wide Web and many others. Categorical data are groups of categories and each value represents some category. The problem of clustering categorical data is solved by the use of the cluster ensemble approach, but this technique generates a final data partition with imperfect information. The ensemble-information matrix that is the binary cluster association matrix content presents only cluster-data point relations with many entries being left unknown and which decrease the quality of the whole data partition. To avoid the degradation of the final data partition, a new approach of linkbased is presented which includes the refined cluster association matrix. It maintains cluster to cluster relation and helps to improve quality of the final data partition result by determining the unknown entries through measuring similarity between clusters in an ensemble. The cluster ensemble combines multiple data partitions from different clustering algorithms into a single clustering solution to improve the robustness, accuracy and quality of the clustering result.
Keywords: Clustering, Categorical, link-based, Ensemble.