An Aggregator Framework for Transforming Big Data in Real-Time using PT-INDRNN
Sowmya R1, Suneetha K R2

1Sowmya R, Research Scholar, Department of Computer & Engineering, Bangalore Institute of Technology, Bengaluru, Visvesvaraya Technological University, Belagavi (Karnataka), India.
2Dr. Suneetha K R, Professor Department of Computer & Engineering, Bangalore Institute of Technology, Bengaluru, Visvesvaraya Technological University, Belagavi (Karnataka), India.
Manuscript received on 02 May 2023 | Revised Manuscript received on 09 May 2023 | Manuscript Accepted on 15 June 2023 | Manuscript published on 30 June 2023 | PP: 12-24 | Volume-12 Issue-5, June 2023 | Retrieval Number: 100.1/ijeat.E41500612523 | DOI: 10.35940/ijeat.E4150.0612523

Open Access | Editorial and Publishing Policies | Cite | Zenodo | Indexing and Abstracting
© The Authors. Blue Eyes Intelligence Engineering and Sciences Publication (BEIESP). This is an open access article under the CC-BY-NC-ND license (http://creativecommons.org/licenses/by-nc-nd/4.0/)

Abstract: The prediction of stock market prices based on the financial text sentiment classification using Machine Learning (ML) and Deep Learning (DL) models is becoming popular among researchers in the era of Big Data (BD). Nevertheless, owing to the lack of extensive analysis, most of the developed ML and DL models failed to achieve better classification results. Thus, for the real-time prediction of the polarity of the stock price, a Probability Tanh-Independently Recurrent Neural Network (PT-IndRNN)-based classification of the sentiment of the financial text data of Twitter is proposed to solve this problem. Primarily, by employing the corresponding API, the real-time financial data and Twitter data are extracted and stored in the MongoDB database using Apache Flume. This stored data with the historical big datasets are taken and pre-processed. Next, by deploying the proposed Hadoop Distributed File System (HDFS) clustering, the pre-processed stock market data and Twitter data in real-time, as well as the historical dataset, are combined separately. After that, the features are extracted from the clustered sentences. Then, by utilizing the SentiWordNet, the sentences chosen using Linear Scaling-Dwarf Mongoose Optimization Algorithm (LS-DMOA) are converted to negative and positive scores. In the end, the sentiment of the financial texts is classified by the PTh-IndRNN, which is proved by obtaining reliable result values. 
Keywords: Hadoop Distributed File System (HDFS), Probability Tanh-Independently RNN (PTh-IndRNN), Application Programming Interface (API), Linear Scaling-Dwarf Mongoose Optimization Algorithm (LS-DMOA), financial text sentiment analysis.
Scope of the Article: Machine Learning