设为首页 |  加入收藏
首页首页 期刊简介 消息通知 编委会 电子期刊 投稿须知 广告合作 联系我们
基于机器学习方法的非编码RNA-蛋白质相互作用的预测

Prediction of ncRNA-protein interactions based on machine learning methods

作者: 程淑萍  谭建军  门婧睿 
单位:北京工业大学生命科学与生物工程学院,智能化生理测量与临床转化北京市国际科研合作基地(北京 100124)
关键词: 非编码RNA-蛋白质相互作用;  LightGBM;  随机森林;  极端梯度增强算法;  卷积自编码器 
分类号:R318.01; Q51
出版年·卷·期(页码):2019·38·4(353-359)
摘要:

目的 非编码RNA-蛋白质的相互作用(noncoding RNA-protein interactions, ncRPI)具有重要的生物学意义,目前预测其相互作用已成为当下研究非编码RNA (noncoding RNA, ncRNA)和蛋白质功能的重要途径之一。方法 本研究基于ncRNA和蛋白质的序列信息提取特征,运用卷积自编码器预处理原始数据,训练三个机器学习模型: LightGBM(LBM)、随机森林(random forest, RF)和极端梯度增强算法(extreme gradient boosting, XGB), 预测ncRNA与蛋白质的相互作用。结果 在RPI369和RPI488两个数据集做5倍交叉验证,LBM、RF与XGB三个模型在两个数据集均达到较高的预测准确率,在RPI369数据集三个模型的预测准确率分别为0.757(LBM)、0.791(RF)、0.791(XGB),在RPI488数据集三个模型的预测准确率分别为0.918(LBM)、0.908(RF)、0.918(XGB);三个模型在RPI1807、RPI2241、RPI13254大数据集也取得较高的AUC(area under curve)值,在RPI1807三个模型的AUC值均为0.99,在RPI2241三个模型最低AUC值为0.87,在RPI13254三个模型最低AUC值为0.81,都表现出较好的预测准确性。结论 机器学习方法能够预测ncRNA与蛋白质是否存在相互作用。

Objective The biological significance of noncoding RNA-protein interactions (ncRPI) is important, and ncRPI prediction is an important way to study the function of noncoding RNA (ncRNA) and protein. Methods We extracted feature based on the sequence of ncRNA and protein in the work, preprocessed raw data by training a convolutional autoencoder (CAE). Three machine learning models, LightGBM (LBM), random forest (RF) and extreme gradient boosting (XGB) were trained to predict the ncRPI. Results We tested the three models by 5-fold cross validation (CV) on RPI369 and RPI488. All the three methods of LBM, RF and XGB achieved high performance with the accuracy of 0.757 (LBM), 0.791 (RF), 0.791 (XGB) on RPI369, respectively. On RPI488, the three models obtained the accuracy of 0.918 (LBM), 0.908 (RF), 0.918 (XGB), respectively. The three models obtained higher area under curve (AUC) on large-scale data. On RPI1807, all the three models obtained the AUC of 0.99, and the smallest AUC of 0.87 and 0.81 on RPI2241 and RPI13254, respectively. All the three methods of LBM, RF and XGB performed well for predicting ncRPI. Conclusions The machine learning methods can be used to predict ncRNA-protein interaction.

参考文献:

[1]    Pan X, Rijnbeek P, Yan J, et al. Prediction of RNA-protein sequence and structure binding preferences using deep convolutional and recurrent neural networks[J]. BMC Genomics, 2018,19:511.

 [2]    Adjeroh D, Allaga M, Tan J, et al. Feature-based and string-based models for predicting RNA-protein interaction[J]. Molecules, 2018,23(3): E697.

 [3]    Suresh V, Liu L, Adjeroh D, et al. RPI-Pred: predicting ncRNA-protein interaction using sequence and structural information[J]. Nucleic Acids Research, 2015,43(3):1370-1379.

 [4]    Zhang SW, Fan XN. Computational methods for predicting ncRNA-protein interactions[J]. Medicinal Chemistry, 2017, 13(6):515-525.

 [5]    Cook KB, Vembu S, Ha KCH, et al. RNAcompete-S: combined RNA sequence/structure preferences for RNA binding proteins derived from a single-step in vitro selection[J]. Methods, 2017,126:18-28.

 [6]    Muppirala UK, Honavar VG, Dobbs D. Predicting RNA-protein interactions using only sequence information[J]. BMC Bioinformatics, 2011,12:489.

 [7]    Wang Y, Chen X, Liu ZP, et al. De novo prediction of RNA-protein interactions from sequence information[J]. Molecular Biosystems, 2013, 9(1):133-142.

 [8]    Pan X, Fan YX, Yan J, et al. IPMiner: hidden ncRNA-protein interaction sequential pattern mining with stacked autoencoder for accurate computational prediction[J]. BMC Genomics, 2016,17:582.

 [9]    张凯宇. 基于深度学习的蛋白质-RNA相互作用预测模型构建[D]. 中国人民解放军军事医学科学院, 2017.

Zhang KY. Construction of prediction model for protein-RNA interaction using the deep learning methods[D]. Academy of Military Medical Sciences, 2017.

[10]    Hu H, Zhang L, Ai H, et al. HLPI-ensemble: prediction of human lncRNA-protein interactions based on ensemble strategy[J]. RNA Biology, 2018,15(6):797-806.

[11]    Alipanahi B, Delong A, Weirauch MT, et al. Predicting the sequence specificities of DNA- and RNA-binding proteins by deep learning[J]. Nature Biotechnology, 2015,33(8):831-838.

[12]    Zeng X, Leung MR, Zeev-Ben-Mordehai T, et al. A convolutional autoencoder approach for mining features in cellular electron cryo-tomograms and weakly supervised coarse segmentation[J]. Journal of Structural Biology, 2017,202(2):150-160.

[13]    Kroll C,von der Werth MVD, Leuck H, et al. Combining high-speed SVM learning with CNN feature encoding for real-time target recognition in high-definition video for ISR missions[C]//Society of Photo-optical Instrumentation Engineers 10202, Automatic Target Recognition XXVII. Anaheim, California, US, 2017:1020208.

[14]    Xia Y, Yang X, Zhang Y. A rejection inference technique based on contrastive pessimistic likelihood estimation for P2P lending[J]. Electronic Commerce Research and Applications, 2018, 30:111-124.

[15]    Qi Y, Klein-Seetharaman J, Bar-Joseph Z. Random forest similarity for protein-protein interaction prediction from multiple sources[J]. Pacific Symposium on Biocomputing, 2005,10:531-542.

[16]    Chen T, Guestrin C. XGBoost: a scalable tree boosting system[C]//the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. San Francisco, USA, 2016.

[17]    Lu Q, Ren S, Lu M, et al. Computational prediction of associations between long non-coding RNAs and proteins[J]. BMC Genomics, 2013,14: 651.

服务与反馈:
文章下载】【加入收藏
提示:您还未登录,请登录!点此登录
 
友情链接  
地址:北京安定门外安贞医院内北京生物医学工程编辑部
电话:010-64456508  传真:010-64456661
电子邮箱:llbl910219@126.com