Pre-Order Post-Order Coded Aggregate Tree based Algorithm for Mining Sequential Patterns
Vemulapalli Saritha1, Mogalla Shashi2
1Vemulapalli Saritha, Department of Information Technology, VNR Vignana Hyderabad India.
2Dr. Mogalla Shashi, Department Of Computer Science And Systems Engineering, Andhra India.
Manuscript received on September 23, 2019. | Revised Manuscript received on October 15, 2019. | Manuscript published on October 30, 2019. | PP: 7597-7600 | Volume-9 Issue-1, October 2019 | Retrieval Number: A2030109119/2019©BEIESP | DOI: 10.35940/ijeat.A2030.109119
Open Access | Ethics and Policies | Cite | Mendeley
© The Authors. Blue Eyes Intelligence Engineering and Sciences Publication (BEIESP). This is an open access article under the CC BY-NC-ND license (http://creativecommons.org/licenses/by-nc-nd/4.0/)
Abstract: Sequential pattern mining is a data mining approach; aims to discover common interesting patterns in sequence datasets, which attracted a significant research interest due to its real world applications in various fields such as web click stream mining, retail business, stock market and bio-informatics. Each sequence in sequence dataset is composed of time ordered events and each event is an item set. It discovers all frequent subsequences having frequency greater than the given minimum support threshold. Discovering sequential patterns is expensive with respect to mining time as well as the amount of memory used, because of aggressive search space growth due to generation of explosive number of frequent subsequences with the sequence length as well as count of distinct items and large volume of sequence dataset. So, research in this domain aims at developing effective data structures which address frequency counting and large search space as well as scalable algorithms to reduce the execution time and the amount of memory utilized. We propose two efficient data structures called Pre-order Post-order Coded Aggregate Tree (PPCA-Tree) for compact representation of the sequence dataset and Root-node List of First-Occurrence Sub Trees Map (RLFOST-Map) for efficient representation of projected databases. We also developed an efficient Partially ordered Sequential PAttern Mining algorithm called PSPAM and Parallel implementation of Partially ordered Sequential PAttern Mining algorithm called PAPSPAM based on PPCA-Tree using RLFOST-Map which eliminates reconstruction of the projected databases. Experimental analysis done on various synthetic datasets proves that our algorithms PSPAM and PAPSPAM outperform prefixspan and other conventional & state-of-the-art algorithms over dense datasets with better scalability.
Keywords: Data mining, Pattern-growth, Sequential patterns, WAP-Tree.