设为首页 |  加入收藏
首页首页 期刊简介 消息通知 编委会 电子期刊 投稿须知 广告合作 联系我们
一种基于卷积神经网络的DIA数据预处理模型

A preprocessing model for dia data based on convolutional neural network

作者: 陈冲  郑浩然 
单位:中国科学技术大学计算机科学与技术学院( 合肥 230027)
关键词: 蛋白质组学;  卷积神经网络;  质谱;  预处理;  相关性 
分类号:R318.04
出版年·卷·期(页码):2020·39·1(56-61)
摘要:

目的 数据非依赖性采集( data independent acquisition,DIA) 是目前针对大通量蛋白质组学 分析常用的一种数据采集方式。 在对 DIA 数据无目标的分析方式中,由于无法预测肽段出现在 DIA 数 据中的位置,需要对谱中所有的峰进行分析。 但谱中含有大量的噪声峰,这些峰会严重影响后续蛋白质 定性定量分析的效率与效果,所以在 DIA 数据的无目标分析过程中先进行预处理以去除噪声峰就成了 很重要的 一 步。 为 了 能 充 分 利 用 从 DIA 数 据 中 提 取 出 来 的 肽 段 在 一 级 质 谱 ( first stage of mass spectrometry,MS1) 和二级质谱( second stage of mass spectrometry,MS2) 中的峰信息,提出质谱卷积神经网 络( mass spectrometry convolutional neural network,MSCNN) 模型。 方法 不同于传统的方法,本文首先提 出适用于 MSCNN 网络结构的样本提取流程,然后利用 MSCNN 对样本进行训练和学习,该模型可以最 大限度利用肽在 MS1 和 MS2 中的特征,最后通过观察模型在测试集中的结果来验证模型的效果。 结果 和传统算法相比,在保证真峰处理效果大致相同的情况下,MSCNN 模型过滤噪声峰的数量提高了约 11.2%。 结论 本文提出的 MSCNN 模型可以更有效地去除 DIA 数据中的噪声峰。

Objective DIA ( data?independent acquisition) data is currently a commonly used data acquisition method for high?throughput proteomics analysis. In the untargeted analysis of DIA data,all peaks in the spectra need to be analyzed because it is impossible to predict where the peptides will appear in the DIA data. However,the spectra contains a large number of noise peaks,which have a great influence on the efficiency and effect of subsequent identification and quantification of protein. Therefore,the preprocessing to remove noise peaks is a critical step during the untargeted analysis of DIA data. In order to make full use of the features of peptides extracted from DIA data in MS1 ( first stage of mass spectrometry) and MS2 ( second stage of mass spectrometry) , we propose a MSCNN ( mass spectrometry convolutional neural network ) model based on convolutional neural network. Methods Unlike traditional methods, this paper first proposes a sample extraction process suitable for MSCNN network structure, and then uses the sample to train MSCNN,which can make the best use of the features of peptides in MS1 and MS2. Finally,the effect of our model is obtained by observing the results of test set. Results Compared with the traditional algorithm,the number of filtered noise peaks of the MSCNN model is increased by about 11?? 2% under the condition that the true peak processing effect is substantially the same. Conclusions The MSCNN model proposed in this paper can remove noise peaks in DIA data more effectively.

参考文献:

[ 1 ]   Peng J, Elias JE, Thoreen CC, et al. Evaluation of multidimensional chromatography coupled with tandem mass spectrometry ( LC / LC-MS / MS) for large-scale protein analysis: the yeast proteome [ J] . Journal of Proteome Research, 2003, 2 (1) : 43-50. [ 2 ]   Schwudke D, Oegema J, Burton L, et al. Lipid profiling by multiple precursor and neutral loss scanning driven by the data? dependent acquisition[ J] . Analytical Chemistry,2006,78 ( 2) : 585-595. [ 3 ]   Gillet LC,Navarro P,Tate S,et al. Targeted data extraction of the MS / MS spectra generated by data?independent acquisition: a new concept for consistent and accurate proteome analysis [ J ] . Molecular & Cellular Proteomics,2012,11(6) : O111. [ 4 ]   Bilbao A, Varesio E, Luban J, et al. Processing strategies and software solutions for data?independent acquisition in mass spectrometry[ J] . Proteomics,2015,15(5-6) : 964-980. [ 5 ]   Koopmans F, Ho JTC, Smit AB, et al. Comparative Analyses of Data Independent Acquisition Mass Spectrometric Approaches: DIA,WiSIM-DIA,and Untargeted DIA[ J] . Proteomics,2018,18 (1) : 1700304. [ 6 ]   Liu H,Sadygov RG,Yates JR. A model for random sampling and estimation of relative protein abundance in shotgun proteomics [ J] . Analytical Chemistry,2004,76(14) : 4193-4201. [ 7 ]   Picotti P, Aebersold R. Selected reaction monitoring-based proteomics: workflows, potential, pitfalls and future directions [ J] . Nature Methods,2012,9(6) : 555-566. [ 8 ]   Rost HL,Rosenberger G,Navarro P,et al. OpenSWATH enables automated,targeted analysis of data?independent acquisition MS data[ J] . Nature Biotechnology,2014,32(3) : 219-223. [ 9 ]   Bernhardt OM,Selevsek N,Gillet LC,et al. Spectronaut: A fast and efficient algorithm for MRM-like processing of data independent acquisition ( SWATH-MS ) data [ C ] / / Sjoberg, Judith . Proceedings of the 60th American Society for Mass Spectrometry ( ASMS ) Conference on Mass Spectrometry. Vancouver,Canada :Springer-Verlag ,2012: 68. [10]   MacLean B,Tomazela DM, Shulman N, et al. Skyline: an open source document editor for creating and analyzing targeted proteomics experiments[ J] . Bioinformatics,2010,26( 7) : 966 - 968. [11]   Reiter L, Rinner O, Picotti P, et al. mProphet: automated data processing and statistical validation for large-scale SRM experiments[ J] . Nature Methods,2011,8(5) : 430-435. [12]   Tsou CC, Avtonomov D, Larsen B, et al. DIA?Umpire: comprehensive computational framework for data?independent acquisition proteomics[ J] . Nature Methods,2015,12( 3) : 258- 264. [13]   Tsou CC,Tsai CF,Teo GC,et al. Untargeted,spectral library?free analysis of data?independent acquisition proteomics data generated using Orbitrap mass spectrometers [ J ] . Proteomics, 2016,16(15-16) : 2257-2271. [14]   Wang J,Tucholska M,Knight JDR,et al. MSPLIT?DIA: sensitive peptide identification for data?independent acquisition[ J] . Nature Methods,2015,12(12) : 1106-1108. [15]   Li Y, Zhong CQ, Xu X, et al. Group?DIA: analyzing multiple data?independent acquisition mass spectrometry data files [ J ] . Nature Methods,2015,12(12) : 1105-1106. [16]   Toumi ML, Desaire H. Improving mass defect filters for human proteins[ J] . Journal of Proteome Research,2010,9(10) : 5492- 5495. [17]   Radenovic F,Tolias G,Chum O. Fine?tuning CNN image retrieval with no human annotation [ J ] . IEEE Transactions on Pattern Analysis and Machine Intelligence,2019,41(7) : 1655-1668. [18]   Zhang K, Zuo W, Chen Y, et al. Beyond a gaussian denoiser: Residual learning of deep cnn for image denoising [ J] . IEEE Transactions on Image Processing,2017,26(7) : 3142-3155. [19]   Li M,Zhang T, Chen Y, et al. Efficient mini?batch training for stochastic optimization [ C] / / KDD , Association for Computing Machinery,Special Interest Group on Knowledge Discovery and Data Mining , et al. Proceedings of the 20th ACM SIGKDD international conference on Knowledge discovery and data mining. New York :ACM,2014: 661-670. [20]   Bello I,Zoph B,Vasudevan V,et al. Neural optimizer search with reinforcement learning [ C ] / / International Conference on Machine Learning , International Machine Learning Society . Proceedings of the 34th International Conference on Machine Learning?Volume 70. Sydney,Australia :Curran Associates,Inc , 2017: 459-468.

服务与反馈:
文章下载】【加入收藏
提示:您还未登录,请登录!点此登录
 
友情链接  
地址:北京安定门外安贞医院内北京生物医学工程编辑部
电话:010-64456508  传真:010-64456661
电子邮箱:llbl910219@126.com