Extracting and Transforming Heterogeneous Data from XML files for Big Data
Tanuja Das1, Ramesh Saha2, Goutam Saha3

1Tanuja Das*, Department of Information Technology, Gauhati University, Guwahati, India.
2Ramesh Saha, Department of Information Technology, Gauhati University, Guwahati, India.
3Goutam Saha, Department of Information Technology, North Eastern Hill University, Shillong, India.
Manuscript received on November 22, 2019. | Revised Manuscript received on December 15, 2019. | Manuscript published on December 30, 2019. | PP: 4276-4280 | Volume-9 Issue-2, December, 2019. | Retrieval Number: B3438129219/2019©BEIESP | DOI: 10.35940/ijeat.B3438.129219
Open Access | Ethics and Policies | Cite | Mendeley
© The Authors. Blue Eyes Intelligence Engineering and Sciences Publication (BEIESP). This is an open access article under the CC BY-NC-ND license (http://creativecommons.org/licenses/by-nc-nd/4.0/)

Abstract: Digital technology is fast changing in the recent years and with this change, the number of data systems, sources, and formats has also increased exponentially. So the process of extracting data from these multiple source systems and transforming it to suit for various analytics processes is gaining importance at an alarming rate. In order to handle Big Data, the process of transformation is quite challenging, as data generation is a continuous process. In this paper, we extract data from various heterogeneous sources from the web and try to transform it into a form which is vastly used in data warehousing so that it caters to the analytical needs of the machine learning community.
Keywords: Big data, data transformation, data warehousing, ETL.