Comparison Between Subspace and Conventional Clustering for High Dimensional Data Analysis
Kahkashan Kouser1, Amrita Priyam2
1Kahkashan Kouser, Research Scholar Department of Computer Science, Birla Institute of Technology, Ranchi 834001, India. and Assistant Professor, Department of Computer Science & Engineering, Gaya College of Engineering, Gaya (Bihar), India
2Amrita Priyam, Associate Professor, Department of Computer Science, Birla Institute of Technology, Ranchi (Jharkhand), India.
Manuscript received on 18 February 2019 | Revised Manuscript received on 27 February 2019 | Manuscript published on 28 February 2019 | PP: 87-90 | Volume-8 Issue-3, February 2019 | Retrieval Number: C5721028319/19©BEIESP
Open Access | Ethics and Policies | Cite | Mendeley | Indexing and Abstracting
© The Authors. Blue Eyes Intelligence Engineering and Sciences Publication (BEIESP). This is an open access article under the CC-BY-NC-ND license (http://creativecommons.org/licenses/by-nc-nd/4.0/)
Abstract: Clustering High dimensional data is a propitious research area in current scenario. Now it becomes a crucial task to cluster multi-dimensional dataset as data-objects are largely dispersed in multi-dimensional space. Most of the conventional algorithms for clustering work on all dimensions of the feature space for calculating clusters. Whereas only few attributes are relevant. Thus their performance is not very Precise. A modified subspace clustering is proposed in this research paper, which does not use all attributes of high-dimensional feature space simultaneously rather, it determine a subspace of attributes which are important for each individual cluster. This subspace of attributes may be same or different for the different cluster. The comparison between conventional K-Means and modified subspace K-means clustering algorithms were done based on various validation matrices. Results of the modified subspace clustering is compared with the conventional clustering algorithm. It was analyzed based on different matrices such as SSE(sum of squared error), WGAD-BGD (Within group average distance minus between group distances) and DBI(Davies-Bouldin index) or Validity index. Artificial data set were used for all the experiments. Results represent the better efficiency and feasibility of modified subspace clustering algorithm over conventional clustering methods.
Keywords: Clustering, High-Dimensional Data, Subspace Clustering, COSA, Clustering On Subset Of Attribute
Scope of the Article: Data Analysis