Pioneering Methods for Enhancing PPI and Phenotype Networks for Candidate Disease Prioritization
M. Renuka Devi1, J. Maria Shyla2

1Dr.M. Renuka Devi, MCA, HOD, Department of BCA, Sri Krishna Arts & Science College, Coimbatore, India.
2J. Maria Shyla*, MCA, Ph.D Scholar, Bharathiar University, Coimbatore, India.
Manuscript received on July 02, 2019. | Revised Manuscript received on July 22, 2019. | Manuscript published on December 30, 2019. | PP: 4005-4012  | Volume-9 Issue-2, December, 2019. | Retrieval Number: B4655129219/2019©BEIESP | DOI: 10.35940/ijeat.B4655.129219
Open Access | Ethics and Policies | Cite | Mendeley
© The Authors. Blue Eyes Intelligence Engineering and Sciences Publication (BEIESP). This is an open access article under the CC BY-NC-ND license (http://creativecommons.org/licenses/by-nc-nd/4.0/)

Abstract: The physical contacts of high-specificity between two or more protein molecules constitute Protein-Protein Interactions (PPIs). PPI networks are modeled through graphs where node denotes proteins and edges denote interaction between proteins. The PPI network plays an important role to identify the interesting disease gene candidates. But, the PPI network usually contains false interactions. Many techniques have been proposed to reconstruct PPI network to remove false interactions and improve ranking of candidate disease. Random Walk with Restart on Diffusion profile (RWRDP) and Random Walk on a Reliable Heterogeneous Network (RWRHN) was two among them. In these methods, Gene topological similarity was incorporated with original PPI network to reconstruct new PPI network. Phenotype network was constructed by calculating similarity between gene phenotypes. The reconstructed network and phenotype networks were combined to rank candidate disease genes. However, the PPI reconstruction was fully related with the quality of protein interaction data. In order to enhance the reconstruction of PPI, a Piecewise Linear Regression (PLR) based protein sequence similarity measure and Bat Algorithm based gene expression similarity were proposed with RHN. In this paper, additional measure called Interaction Level Sub cellular Localization Score (ILSLS) is proposed to further reduce the false interaction in the reconstruction of PPI network. ILSLS is the combination of Normalized Sub cellular Localization score (NSL) and Protein Multiple Location Prediction score (PMLP). The proposed work is named as Random Walker on Optimized Trustworthy Heterogeneous Sub Cellular localization aware Network (RW-OTHSN). In order to enhance the ranking of RWOTHSN, phenotype structure is considered while construction phenotype network to rank the candidate disease genes. The phenotype structure is characterized based on h*-sequence model which identify highly discriminative signatures with only a small number of genes. This proposed work is named as Random Walker on Optimized Trustworthy Heterogeneous Sub Cellular localization and Phenotype structure aware Network (RWOTHSPN). The efficiency of the proposed methods are evaluated on PPI network database in terms of Average degree, Relative Frequency for PPI reconstruction, Number of successful predictions, precision and recall for candidate disease gene ranking.
Keywords: Candidate disease gene prediction, Candidate disease gene prioritization, Phenotype structure, Random walk, sub-cellular information.