设为首页 |  加入收藏
首页首页 期刊简介 消息通知 编委会 电子期刊 投稿须知 广告合作 联系我们
基于高通量测序数据的微生物检测算法

Microorganism detecting algorithm based on high-throughput sequencing

作者: 李江域  王小磊  刘阳  毛逸清  赵东升  王玉民 
单位:军事医学科学院卫生勤务与医学情报研究所(北京100850)
关键词: 高通量测序;微生物检测;序列比对;序列拼接;算法 
分类号:
出版年·卷·期(页码):2013·32·5(463-466)
摘要:

目的 设计一种基于高通量测序数据的功能强大、处理速度快且不依赖于运行环境的本地化的微生物检测算法。方法 对微生物基因组进行分组,每次使用一组微生物基因组提取映射到其上的测序数据并滤除数据中的人类基因组数据,然后对序列进行拼接和拼接片段比对。如果根据比对结果检测出微生物种属则流程结束,否则使用下一组微生物基因组进行分析。若使用所有微生物基因组分析结束后仍未确定微生物种属,则滤除剩余的测序序列中的人类测序数据并进行拼接,拼接片段通过序列比对无法匹配到微生物基因组,则将这些拼接片段归为未知病原微生物的基因组片段。结果 利用新的检测算法对模拟数据和实际测序数据进行分析,以RINS作为对比。对于已知病原微生物,新算法的平均处理时间为75min,RINS的平均处理时间为767min,两个算法检测结果一致,新算法得到的拼接序列更长。对于未知病原微生物样本,新算法检测的平均处理时间为64min,RINS的为584min,新算法得到了较完整的原始序列。对于实测数据,新算法的平均处理时间为23min,RINS的为68min,检测结果一致。 结论 本文实现的微生物检测算法能够对微生物进行准确、快速的检测,同时,新的检测算法可以发现未知的微生物并获取未知微生物的基因组片段。

Objective To design a microorganism detecting algorithm based on high-throughput sequencing that can detect the sample fast and be independent of any runtime environment.Methods The microorganism genomes are divided into the groups of bacteria,virus and fungi.First we use the virus genomes as reference to get the reads mapped to them,and filter the human sequencing data,then assemble the reads and align the contigs to virus genomes.If the microorganism is virus,the detecting finished,otherwise,genome sequencing of bacteria and fungi is used if the microorganism does not belong to the former group.If we still cannot get result when all the groups have been used,we use the sequencing data left to filter the human data and assemble the rest reads.After verified,the contigs are the genome fragment of unknown microorganism.Results The simulated data and real sequencing data are analyzed by the new algorithm and RINS to compare.The detecting results are the same yet the runtime of new algorithm is 75min and 64min for the two simulated data and 23min for SRR073726,comparing to RINS being 767min,64min and 68min,respectively.For the two simulated sequencing,the outputs of new algorithm are much longer than those of RINS.Conclusions The new algorithm can detect the microorganism fast and accurately,and can also detect the unknown microorganism and output the fragments of its genome.

参考文献:

[1]Illumina Website.An Introduction to Next-Generation Sequencing  Technology [EB/OL].(2012-12-20).http://www.illumina.com/Documents/products/Illumina_Sequencing_Introduction.pdf.

[2]Hausen Z.The Search for Infectious Causes of Human Cancers: Where and Why[J].Virology,2009,392:1-10.
[3]Kostic AD,Ojesina AI,Pedamallu CS,et al.PathSeq: software to identify or discover microbes by deep sequencing of human tissue[J].Nature Biotechnology,2011,29(5): 393-396.
[4]Bhaduri A,Qu K,Lee CS,et al.Rapid identification of nonhuman sequences in high throughput sequencing data sets[J].Bioinformatics,2012,28(8): 1174-1175.
[5]Chen YX,Yao H,Thompson EJ,et al.VirusSeq: Software to identify viruses and their integration sites using next-generation sequencing of human cancer tissue[J].Bioinformatics,2013,29(2): 266-267.
[6]Borozan I,Wilson S,Blanchette P,et al.CaPSID: A bioinformatics platform for computational pathogen sequence identification in human genomes and transcriptomes[J].BMC Bioinformatics,2012,13: 206-217.
[7]Introduction for Amazon EC2 cloud computing platform[EB/OL].(2013-01-12).http://aws.amazon.com/cn/ec2/.
[8]Introduction for Amazon S3 cloud storage[EB/OL].(2013-01-12).http://aws.amazon.com/cn/s3/.
[9]Li H,Durbin R.Fast and accurate short read alignment with Burrows-Wheeler Transform[J].Bioinformatics,2009,25: 1754-60.
[10]Langmead B,Trapnell C,Pop M,Salzberg SL.Ultrafast and memory-efficient aligment of short DNA sequencing to the human genome[M].Genome Biology,2009,10(3): R25.
[11]Rodriguez N,Hackenberg M,Aransay AM.Bioinformatics for High Throughput Sequencing[M].Springer Science+Business Media,2012: 90-103.
[12]Zerbino D,Birney E.Velvet: algorithms for de novo short read assembly using de Bruijn graphs[J].Genome Research,2008,18: 821-829.
[13]Altschul SF,Gish W,Miller W,et al.Basic local alignment search tool[J].Journal of Molecular Biology,1990,215 (3):403-410.
[14]Hg19[EB/OL].(2013-01-05).http://hgdownload.cse.ucsc.edu/goldenPath/hg19.
[15]NCBI[EB/OL].(2013-01-05).www.ncbi.nlm.nih.gov/.
[16]McElroy KE,Luciani F,Thomas T.GemSIM: general,error-model based simulator of next-generation sequencing data[J].BMC Genomics,2012,13: 74.

服务与反馈:
文章下载】【加入收藏
提示:您还未登录,请登录!点此登录
 
友情链接  
地址:北京安定门外安贞医院内北京生物医学工程编辑部
电话:010-64456508  传真:010-64456661
电子邮箱:llbl910219@126.com