Scaling Up for the Streaming Data
Shabia Shabir Khan1, Mushtaq Ahmed Peer2, S.M.K Quadri3
1Shabia Shabir Khan, Computer Science Department, University of Kashmir, Srinagar, India.
2Dr. Mushtaq Ahmed Peer, Computer Science Department, University of Kashmir, Srinagar, India.
3Dr. S.M.K. Quadri, Computer Science Department, University of Kashmir, Srinagar, India.
Manuscript received on November 23, 2012. | Revised Manuscript received on December 10, 2012. | Manuscript published on December 30, 2012. | PP: 362-368 | Volume-2, Issue-2, December 2012.  | Retrieval Number: B0968112212 /2012©BEIESP

Open Access | Ethics and Policies | Cite
© The Authors. Blue Eyes Intelligence Engineering and Sciences Publication (BEIESP). This is an open access article under the CC BY-NC-ND license (http://creativecommons.org/licenses/by-nc-nd/4.0/)

Abstract: Knowledge has always been the success factor for any organization (business / technical). Survey 2012 shows that every day about 2.5 quintillion (2.5×1018) bytes of data were created. As a result we are facing a challenge of handling such voluminous, potentially infinite, fast changing, temporally ordered data streams in a proper and timely manner so as to extract useful knowledge from that. However, due to its tremendous volume, we cannot store the whole of the streaming data in our limited or finite storage and due to its continuous flow we have to process it in a single pass, in contrast to the warehoused data where we could go through the data in multiple passes. In addition to this, we have to work in a limited amount of time. So, time and space are the important aspects that are taken into consideration while handling the streams of data. This paper discusses and compares those issues in the light of some sketching and counting algorithms and provides application oriented data-flow architecture for processing the streaming data along with the Granularity based approach that takes into consideration the resource awareness and adaptation for data stream mining algorithms. Further, since Analysts are mostly interested either in the recent data or in the broader view of the data, so this paper discusses a dynamic H-cube to facilitate multi-resolution analysis of streaming data wherein the Partial materialization is performed and computations are done on the fly using a tilted time frame.
Keywords: Frequency as an Interestingness Criteria, Partial Materialization, Streaming Data, Time Granularity.