Indian Premier League Dataset Analytics using Hadoop-Hive
Sapna S.1, Sandhya S.2

1Sapna S.*, Department of Information Science and Engineering, NMAM Institute of Technology, Nitte, Karkala (Karnataka) India.
2Sandhya S., Department of Information Science and Engineering, NMAM Institute of Technology, Nitte, Karkala (Karnataka) India.
Manuscript received on November 20, 2019. | Revised Manuscript received on December 15, 2019. | Manuscript published on December 30, 2019. | PP: 3999-4004  | Volume-9 Issue-2, December, 2019. | Retrieval Number: B4579129219/2019©BEIESP | DOI: 10.35940/ijeat.B4579.129219
Open Access | Ethics and Policies | Cite | Mendeley
© The Authors. Blue Eyes Intelligence Engineering and Sciences Publication (BEIESP). This is an open access article under the CC BY-NC-ND license (http://creativecommons.org/licenses/by-nc-nd/4.0/)

Abstract: Big Data is a term used to represent huge volume of both unstructured and structured data which cannot be processed by the traditional data processing techniques. This data is too huge, grows exponentially and doesn’t fit into the structure of the traditional database systems. Analyzing Big Data is a very challenging task since it involves the processing of huge amount of data. As the industry or its business grows, the data related to the industries also tend to grow on a larger scale. Prominent data analysis tools are required to analyze the data in order to gain value out of it. Hadoop is a sought-after open source framework that uses MapReduce techniques to store and process huge datasets. However, the programs written using MapReduce techniques are not flexible and also require maintenance. This problem is overcome by making use of Hive QL. In order to execute queries in Hive QL, the platform required is Hive. It is an open-source data warehousing set-up built on Hadoop. Hive QL queries are compiled into MapReduce jobs that are executed utilizing Hadoop. In this paper we have analyzed the Indian Premier League dataset using Hive QL and compared its execution time with that of traditional SQL queries. It was found that the Hive QL provided better performance with larger dataset while SQL performed better with smaller datasets.
Keywords: Big Data, Hadoop, Hive, IPL.