Repartitioned Optimized K-Mean Centroid Based Partitioned Clustering using MapReduce in Analyzing High Dimensional Big Data
N. Sree Ram1, M.H.M. Krishna Prasad2, K. Satya Prasad3

1N.Sree ram*, Department of CSE, Koneru Lakshmaiah Education Foundation, Vaddeswaram, (A.P), India.
2Dr.M.H.M. Krishna Prasad, Professor of CSE, University College of Engineering Kakinada(A) J.N.T.U. KAK India.
3Dr. k K. Satya Prasad, Professor of ECE University College of Engineering Kakinada(A) J.N.T.U. KAK India.
Manuscript received on September 17, 2019. | Revised Manuscript received on October 05, 2019. | Manuscript published on October 30, 2019. | PP: 616-620 | Volume-9 Issue-1, October 2019 | Retrieval Number: F8069088619/2019©BEIESP | DOI: 10.35940/ijeat.F8069.109119
Open Access | Ethics and Policies | Cite | Mendeley
© The Authors. Blue Eyes Intelligence Engineering and Sciences Publication (BEIESP). This is an open access article under the CC BY-NC-ND license (http://creativecommons.org/licenses/by-nc-nd/4.0/)

Abstract: With the advent of IoT, number of IOT-devices are deployed in the city to acquisition data. These devices acquire enormous data and to analyze such data one need to configure novel hardware to scale up the existing servers and need to develop an application with précised framework. This work recommends an adapted scale out approach in which huge multi-dimensional datasets can be processed using existing commodity hardware. In this approach, Hadoop Distributed File System (HDFS) holds the huge multi-dimensional data to be processed and it can be processed and analyzed by using MapReduce (MR) framework. In the proposed approach, we implemented an optimized repartitioned K-Means centroid based partitioning clustering algorithm using MR framework for Smart City dataset. This dataset contains 10 million objects and each object has six attributes. The results show that the proposed approach is a scalable approach to compute intra cluster density and inter cluster density effectively.
Keywords: Distributed computing, Distributed File System, Hadoop Map Reduce Framework, Inter cluster density, and Re partitioned K-Means.