UDD based Procedure for Record Deduplication over Digital Storage Systems
Shaik Anjumun Jabeen1, Y Prasanth2, Gudapati Syam Prasad3

1Shaik Anjumun Jabeen, Department of CSE, Koneru Lakshmaiah Education Foundation, Vaddeswaram (A.P), India.
2Dr. Y Prasanth, Department of CSE, Koneru Lakshmaiah Education Foundation, Vaddeswaram (A.P), India.
3Dr. Gudapati Syam Prasad, Department of CSE, Koneru Lakshmaiah Education Foundation, Vaddeswaram (A.P), India.

Manuscript received on 18 April 2019 | Revised Manuscript received on 25 April 2019 | Manuscript published on 30 April 2019 | PP: 1850-1856 | Volume-8 Issue-4, April 2019 | Retrieval Number: D7005048419/19©BEIESP
Open Access | Ethics and Policies | Cite | Mendeley | Indexing and Abstracting
© The Authors. Blue Eyes Intelligence Engineering and Sciences Publication (BEIESP). This is an open access article under the CC-BY-NC-ND license (http://creativecommons.org/licenses/by-nc-nd/4.0/)

Abstract: Digital libraries, E-commerce brokers and similar vast information-oriented systems rely on consistent data to offer high-quality services. But presence of duplicates, quasi replicas, or near-duplicate entries (Dirty Data) in their repositories asperses their storage resources directly and delivery issues indirectly. Significant investments in this field from interested parties prompted the need for best methods for removing replicas from data repositories. Prior approaches involved using SVM classifiers, approaches to handle these dirty data. New distributed deduplication systems with higher reliability in which the data chunks are distributed across multiple cloud servers. The security requirements of data confidentiality and tag consistency are also achieved by introducing a deterministic secret sharing scheme in distributed storage systems, instead of using convergent encryption as in previous deduplication systems. So propose to use Unsupervised Duplicate Detection (UDD) Mechanism a query-dependent record matching method that requires no pre trained data set. UDD uses two cooperating classifiers that is, a weighted component similarity summing (WCSS) classifier and an SVM classifier that iteratively identifies duplicates in the query results from data sources. Achieves the same efficiency in terms of Deduplication results but significantly at a better performance rate (time) compared to GP systems. A practical implementation of the proposed approach validates the claim
Keywords: Cloud Computing Environment, Genetic Programming, Active Learning, Deduplication and Cloud Security.

Scope of the Article: Cloud Computing