File Formats for Big Data Storage Systems
Samiya Khan1, Mansaf Alam2

1Samiya Khan, Department of Computer Science, Jamia Millia Islamia, New Delhi, India.
2Mansaf Alam*, Department of Computer Science, Jamia Millia Islamia, New Delhi, India.
Manuscript received on September 22, 2019. | Revised Manuscript received on October 20, 2019. | Manuscript published on October 30, 2019. | PP: 2906-2912 | Volume-9 Issue-1, October 2019 | Retrieval Number: A1196109119/2019©BEIESP | DOI: 10.35940/ijeat.A1196.109119
Open Access | Ethics and Policies | Cite | Mendeley
© The Authors. Blue Eyes Intelligence Engineering and Sciences Publication (BEIESP). This is an open access article under the CC BY-NC-ND license (http://creativecommons.org/licenses/by-nc-nd/4.0/)

Abstract: Big data is one of the most influential technologies of the modern era. However, in order to support maturity of big data systems, development and sustenance of heterogeneous environments is requires. This, in turn, requires integration of technologies as well as concepts. Computing and storage are the two core components of any big data system. With that said, big data storage needs to communicate with the execution engine and other processing and visualization technologies to create a comprehensive solution. This brings the facet of big data file formats into picture. This paper classifies available big data file formats into five categories namely text-based, row-based, column-based, in-memory and data storage services. It also compares the advantages, shortcomings and possible use cases of available big data file formats for Hadoop, which is the foundation for most big data computing technologies. Lastly, it provides a discussion on tradeoffs that must be considered while choosing a file format for a big data system, providing a framework for creation for file format selection criteria.
Keywords: Big Data Storage, File Formats, Hadoop, Storage Systems.