设为首页 |  加入收藏
首页首页 期刊简介 消息通知 编委会 电子期刊 投稿须知 广告合作 联系我们
一种基于深度学习的蛋白质组分析方法

A proteome analysis method based on deep learning

作者: 刘扣龙  郑浩然 
单位:中国科学技术大学计算机科学与技术学院(合肥 230027),<br />通信作者:郑浩然,副教授。E-mail: hrzheng@ustc.edu.cn
关键词: 蛋白质组学;深度学习;数据非依赖性采集;相对定量;质谱 
分类号:R318
出版年·卷·期(页码):2022·41·6(569-575)
摘要:

目的 基于液相色谱-串联质谱的数据非依赖性采集(data-independent acquisition, DIA)方法是蛋白质组数据获取的一种主要方式,采集的混合二级质谱由多个肽段同时碎裂组成,增加了肽段定性和定量的复杂度。目前主流的基于离子色谱图的方法需要经过预处理,构建色谱峰,提取色谱峰特征等操作。这类方法流程复杂,存在很多误差,并且不同的色谱图复杂度和色谱时间会影响定性和定量的准确度。针对该方法的不足之处,课题组提出一种基于深度学习的方法,直接对肽段进行定性和定量。 方法 与基于离子色谱图的方法不同,本课题组没有使用色谱维度的信息,不会受到色谱图复杂度和色谱时间等因素的影响。将预处理后的质谱数据输入到两个基于卷积神经网络(convolutional neural network, CNN)的模型中,通过二分类和回归预测的方式,解决定性和定量问题。 结果 课题组在公开数据集上进行了实验,与准确度较高的FIGS相比,提高了定性结果的重复性,在保证定量准确度的同时提高了不同丰度下的肽段定量数量。 结论 本文提出的基于深度学习的模型,没有使用色谱维度的信息,可以有效地对肽段进行定性和定量。

Objective The data-independent acquisition (DIA) method based on liquid chromatography-tandem mass spectrometry is one of the main methods of proteomic data acquisition. The collected mixed MS/MS is composed of multiple peptide fragments at the same time, which increases the complexity of peptide identification and quantification. The current mainstream methods based on ion chromatograms require preprocessing, construction of chromatographic peaks, and extraction of chromatographic peak features. This kind of method is complicated in process, there are many errors, and different chromatogram complexity and chromatographic time will affect the accuracy of identification and quantification. In view of the shortcomings of this method, we propose a method based on deep learning to directly identify and quantify peptides. Methods Unlike methods based on ion chromatograms, we do not use the information of chromatographic dimensions and will not be affected by factors such as the complexity of chromatograms and chromatographic time. Input the preprocessed mass spectrum data into two models based on convolutional neural networks, and solve qualitative and quantitative problems through binary classification and regression prediction. Results We conducted experiments on the public dataset. Compared with FIGS with high accuracy, it improved the qualitative repeatability and increased the quantitative number of peptides under different abundances while ensuring the quantitative accuracy. Conclusions The model based on deep learning proposed in this paper does not use the information of chromatographic dimensions, and can effectively identify and quantify peptides. 

参考文献:

[1] Aebersold R, Mann M. Mass spectrometry-based proteomics[J]. Nature, 2003, 422(6928): 198-207.
[2] Gillet LC, Navarro P, Tate S, et al. Targeted data extraction of the MS/MS spectra generated by data-independent acquisition: a new concept for consistent and accurate proteome analysis[J]. Molecular & Cellular Proteomics, 2012, 11(6): O111.016717.
[3] Venable JD, Dong MQ, Wohlschlegel J, et al. Automated approach for quantitative analysis of complex peptide mixtures from tandem mass spectra[J]. Nature Methods, 2004, 1(1): 39-45.
[4] Ludwig C, Gillet L, Rosenberger G, et al. Data-independent acquisition-based SWATH-MS for quantitative proteomics: a tutorial[J]. Molecular Systems Biology, 2018, 14(8): e8126.
[5] Noor Z, Adhikari S, Ranganathan S, et al. Quantification of proteins from proteomic analysis[M]//Encyclopedia of Bioinformatics and Computational Biology: ABC of Bioinformatics. Amsterdam: Elsevier, 2019,3: 871-890.
[6] Rst HL, Rosenberger G, Navarro P, et al. OpenSWATH enables automated, targeted analysis of data-independent acquisition MS data[J]. Nature Biotechnology, 2014, 32(3): 219-223.
[7] Wang J, Pérez-Santiago J, Katz JE, et al. Peptide identification from mixture tandem mass spectra[J]. Molecular & Cellular Proteomics, 2010, 9(7): 1476-1485.
[8] Wang J, Bourne PE, Bandeira N. MixGF: spectral probabilities for mixture spectra from more than one peptide[J]. Molecular & Cellular Proteomics, 2014, 13(12): 3688-3697.
[9] Wang J, Tucholska M, Knight JDR, et al. MSPLIT-DIA: sensitive peptide identification for data-independent acquisition[J]. Nature Methods, 2015, 12(12): 1106-1108.
[10] Peckner R, Myers SA, Jacome ASV, et al. Specter: linear deconvolution for targeted analysis of data-independent acquisition mass spectrometry proteomics[J]. Nature Methods, 2018, 15(5): 371-378.
[11] Demichev V, Messner CB, Vernardis SI, et al. DIA-NN: neural networks and interference correction enable deep proteome coverage in high throughput[J]. Nature Methods, 2020, 17(1): 41-44.
[12] Fang Y, Li QR, Zhang ZH, et al. FIGS: featured ion-guided stoichiometry for data-independent proteomics through dynamic deconvolution[J]. Journal of Proteome Research, 2021, 20(8): 4131-4138.
[13] Navarro P, Kuharev J, Gillet LC, et al. A multicenter study benchmarks software tools for label-free proteome quantification[J]. Nature Biotechnology, 2016, 34(11): 1130-1136.
[14] Collins BC, Hunter CL, Liu Y, et al. Multi-laboratory assessment of reproducibility, qualitative and quantitative performance of SWATH-mass spectrometry[J]. Nature Communications, 2017, 8: 29.
[15] Cheng CY, Tsai CF, Chen YJ, et al. Spectrum-based method to generate good decoy libraries for spectral library searching in peptide identifications[J]. Journal of Proteome Research, 2013, 12(5): 2305-2310.
[16] MacLean B, Tomazela DM, Shulman N, et al. Skyline: an open source document editor for creating and analyzing targeted proteomics experiments[J]. Bioinformatics, 2010, 26(7): 966-968.
[17] Bruderer R, Bernhardt OM, Gandhi T, et al. Extending the limits of quantitative proteome profiling with data-independent acquisition and application to acetaminophen-treated three-dimensional liver microtissues[J]. Molecular & Cellular Proteomics, 2015, 14(5): 1400-1410.
[18] Tsou CC, Avtonomov D, Larsen B, et al. DIA-Umpire: comprehensive computational framework for data-independent acquisition proteomics[J]. Nature Methods, 2015, 12(3): 258-264.

服务与反馈:
文章下载】【加入收藏
提示:您还未登录,请登录!点此登录
 
友情链接  
地址:北京安定门外安贞医院内北京生物医学工程编辑部
电话:010-64456508  传真:010-64456661
电子邮箱:llbl910219@126.com