Research Papers:
Sequence-based predictive modeling to identify cancerlectins
Metrics: PDF 1612 views | HTML 3542 views | ?
Abstract
Hong-Yan Lai1, Xin-Xin Chen1, Wei Chen1,2, Hua Tang3, Hao Lin1
1Key Laboratory for Neuro-Information of Ministry of Education, School of Life Science and Technology, Center for Informational Biology, University of Electronic Science and Technology of China, Chengdu, China
2Department of Physics, School of Sciences, and Center for Genomics and Computational Biology, North China University of Science and Technology, Tangshan, Tangshan, China
3Department of Pathophysiology, Southwest Medical University, Luzhou, China
Correspondence to:
Hua Tang, email: [email protected]
Hao Lin, email: [email protected]
Keywords: cancerlectins, binomial distribution, optimal tripeptides, SVM
Received: January 18, 2017 Accepted: February 24, 2017 Published: March 07, 2017
ABSTRACT
Lectins are a diverse type of glycoproteins or carbohydrate-binding proteins that have a wide distribution to various species. They can specially identify and exclusively bind to a certain kind of saccharide groups. Cancerlectins are a group of lectins that are closely related to cancer and play a major role in the initiation, survival, growth, metastasis and spread of tumor. Several computational methods have emerged to discriminate cancerlectins from non-cancerlectins, which promote the study on pathogenic mechanisms and clinical treatment of cancer. However, the predictive accuracies of most of these techniques are very limited. In this work, by constructing a benchmark dataset based on the CancerLectinDB database, a new amino acid sequence-based strategy for feature description was developed, and then the binomial distribution was applied to screen the optimal feature set. Ultimately, an SVM-based predictor was performed to distinguish cancerlectins from non-cancerlectins, and achieved an accuracy of 77.48% with AUC of 85.52% in jackknife cross-validation. The results revealed that our prediction model could perform better comparing with published predictive tools.
All site content, except where otherwise noted, is licensed under a Creative Commons Attribution 4.0 License.
PII: 15963