设为首页 |  加入收藏
首页首页 期刊简介 消息通知 编委会 电子期刊 投稿须知 广告合作 联系我们
基于PU-learning的磷酸激酶预测算法

Prediction algorithm of phosphokinase based on PU-learning

作者: 王艺琪  王明举  张进  彭智才  魏森  谢多双 
单位:太和医院 (湖北十堰 442000)
关键词: 蛋白质磷酸化;  生物信息;  半监督学习;  PU-learning;  磷酸激酶预测 
分类号:R318
出版年·卷·期(页码):2019·38·4(360-368)
摘要:

目的 Protein phosphorylation is the process where a protein kinase binds to a specific site/domain of a protein substrate for post-蛋白质磷酸化是通过激酶催化特定位点把磷酸基转移到底物蛋白质氨基酸残基的过程,是研究蛋白质活力及功能的重要机制。目前已鉴定的数千个磷酸化位点大多缺失激酶信息,为此本研究提出基于PU-learning的磷酸激酶预测算法,通过迭代标记磷酸位点,可以准确预测催化磷酸肽的磷酸激酶。方法 首先该算法以PU-learning为框架,利用最大熵方差对不同种类的磷酸激酶自动筛选最佳阈值,从而提取每条磷酸肽上潜在的磷酸化位点,然后根据统计分析确定磷酸化位点对应的激酶,最后通过五折交叉验证该算法在Phospho.ELM数据库上的预测性能,并与现有算法对比。结果 Experimental results demonstrate that该算法SLKSL的交叉验证特异性和灵敏度比现有最好算法在单个数据集上最高提高4%及10%,其预测Phospho.ELM中数据准确度达到79.52%。结论 基于PU-learning的磷酸激酶预测算法显著优于现有算法,且可以准确预测Phospho.ELM数据库中未知激酶信息的磷酸肽,在磷酸化实验中具有较强的指导意义。

Objective Protein phosphorylation is a process by which a kinase catalyzes the transfer of a phosphate group to a protein residue at a specific site, as an important mechanism of protein activity and function. Most of identified phosphorylation sites are lack of kinase information. To this end, a prediction algorithm of phosphokinase based on PU-learning is proposed. By iterative phosphate site labeling, the phosphokinase that catalyzes the phosphopeptide can be accurately predicted. Methods The algorithm uses PU-learning as the framework to automatically screen the optimal thresholds for different kinds of phosphokinases by using the maximum entropy variance, so as to extract the potential phosphorylation sites on each phosphopeptide, and then determines the corresponding phosphorylation sites according to statistical analysis. Finally, the prediction performance is verified by a five-fold cross validation on the Phospho.ELM database and compared with existing algorithms. Results The cross-validation specificity and sensitivity of this algorithm are 4% and 10% higher than those of the best existing approach on single data set, and the prediction accuracy on Phospho.ELM is as high as 79.52%. Conclusions The prediction algorithm of phosphokinase based on PU-learning is significantly better than the existing algorithms, and can accurately predict the phosphopeptides of unknown kinase information in the Phospho.ELM database, which has a strong guiding significance in phosphorylation experiments.

参考文献:

[1]     Davis  MI, Hunt JP, Herrgard S, et al. Comprehensive analysisof kinase  inhibitor selectivity[J]. Nature Biotechnology, 2011, 29(11):1046-1051.

[2]      刘博雅, 贺福初, 王建. 蛋白质翻译后修饰对STAT家族活性的调节[J]. 生命科学, 2013(3):275-279.

Liu BY, He FC, Wang J. The regulation of STAT activity by post-translational modifications[J]. Chinese Bulletin of Life Sciences, 2013(3):275-279.

[3]      Kim JH, Lee J, Oh B, et al. Prediction of phosphorylation sites using SVMs[J]. Bioinformatics, 2004,20(17): 3179-3184.

[4]      Wong  YH, Lee TY, Liang HK, et al. KinasePhos 2.0: a webserver for  identifying protein kinase-specific phosphorylation sites basedon  sequences and coupling patterns[J]. Nucleic Acids Research, 2007, 35(Web  Server issue):588-594.

[5]      Blom  N, Sicheritz-Pontén T, Gupta R, et al. Prediction of post-translational  glycosylation and phosphorylation of proteins from the amino acid  sequence[J]. Proteomics, 2004, 4(6):1633-1649.

[6]      Xue  Y, Li A, Wang L, et al. PPSP: prediction of PK-specific phosphorylation  site with Bayesian decision theory[J]. BMC Bioinformatics, 2006, 7:163.

[7]      Wang  MH, Li CH, Chen WZ, et al.Prediction of PK-specificphosphorylation site  based oninformation entropy[J]. Science in China Series C: Life  Sciences, 2008, 51(1): 12-20.

[8]      Xue  Y, Ren J, Gao X, et al. GPS 2.0, a tool to predict kinase-specific  phosphorylation sites in hierarchy[J]. Molecular & Cellular  Proteomics, 2008, 7(9): 1598-1608.

[9]      Diella  F, Gould CM, Chica C, et al. Phospho.ELM: a database of phosphorylation  sites-update[J]. Nucleic Acids Research, 2008, 36(suppl 1):D240-D244.

[10]   Wang L, Chen C, Zhou J, et al. Time-sensitive customer churn prediction based on PU learning[J]. 2018.

[11]   Yamazaki  K. Accuracy analysis of semi-supervised classification when the class  balance changes[J]. Neurocomputing, 2015, 160:132-140.

[12]   Zou  L, Wang M, Shen Y, et al. PKIS: computational identification of protein  kinases for experimentally discovered protein phosphorylation sites[J].  BMC Bioinformatics, 2013, 14(1):247.

[13]   Linding  R, Jensen LJ, Pasculescu A, et al. NetworKIN: a resource for exploring  cellular phosphorylation networks[J]. Nucleic Acids Research,2008,  36(suppl 1):D695-699.

[14]   Chen  X, Shi SP, Suo SB, et al. Proteomic analysis and prediction of human  phosphorylation sites in subcellular level reveals subcellular  specificity[J]. Bioinformatics, 2015 31(2):194-200.

[15]   Ismail  HD, Jones A, Kim JH, et al. Phosphorylation sites prediction using  random forest[C]// 5th IEEE International Conference on Computational  Advances in Bio and Medical Sciences (ICCABS). Miami, FL, USA, 2015:1-6.

[16]   Li  H, Xu X, Feng H, et al. A novel kinase-substrate relation prediction  method based on substrate sequence similarity and phosphorylation  network[J]. IFAC PapersOnLine, 2015, 48(28):17-21.

[17]   Patrick  R, Horin C, Kobe B, et al. Prediction of kinase-specific  phosphorylation sites through an integrative model of protein context  and sequence[J]. Biochimica et Biophysica Acta (BBA) - Proteins and  Proteomics, 2016, 1864(11):1599-1608.

[18]   Kaushik  AC, Pal A, Kumar A, et al. Internal transcribed spacer sequence  database of plant fungal pathogens: PFP-ITSS Database[J]. Informatics in  Medicine Unlocked, 2017, 7: 34-38.

服务与反馈:
文章下载】【加入收藏
提示:您还未登录,请登录!点此登录
 
友情链接  
地址:北京安定门外安贞医院内北京生物医学工程编辑部
电话:010-64456508  传真:010-64456661
电子邮箱:llbl910219@126.com