设为首页 |  加入收藏
首页首页 期刊简介 消息通知 编委会 电子期刊 投稿须知 广告合作 联系我们
微生物分类单元聚类算法比较研究

Comparison of the clustering algorithms based on operational taxonomic units

作者: 周晨  张绍武  陈伟 
单位:                      西北工业大学自动化学院(西安710072)        
关键词:                     聚类;操作分类单元;16S  rRNA基因;微生物          
分类号:
出版年·卷·期(页码):2014·33·6(591-597)
摘要:

           目的  随着高通量测序技术的发展,产生了大量的微生物16S rRNA基因序列数据。对该数据进行精确的微生物操作分类单元(operational taxonomic unit, OTU)划分,有助于了解环境中微生物的种群组成及分布。 方法  本文在真实数据集与模拟数据集上,对现有的7种流行OTU单元聚类算法进行了对比研究,并分析了这些算法的优缺点及使用范围。 结果  序列长度、测序深度对聚类结果均有影响。 结论  相同的序列相似性阈值下,不同的聚类算法聚类结果差异较大,其中CROP算法的鲁棒性和抗噪性较好。    

       Objective Recent advance of high-throughput next-generation sequencing technology allows us to generate a great deal of 16S rRNA sequences. We can explore the population composition and distribution of the environmental microbes by accurately clustering the 16S rRNA sequences into operational taxonomic units (OTU). Methods In the present work, we conducted a comprehensive evaluation of seven existing methods for OTU inference based on both real and simulated data, and identified the advantages and limitation of these algorithms. Results We found the sequence length and sequencing depth affected the OTU results. Conclusions At the same sequence similarity threshold, the clustering results of these clustering algorithms are different and the CROP algorithm is robust and insensitive to noise.

参考文献:

           [1]Karl DM. Microbial oceanography: paradigms, processes and promise[J]. Nature, 2007, 5: 759-767. [2]Pace NR. A molecular view of microbial diversity and the biosphere[J]. Science, 1997, 276(5313):734-740. [3]Sharpton TJ, Riesenfeld SJ, Kembel SW, et al. PhylOTU: A High-throughput procedure quantifies microbial community diversity and resolves novel taxa from metagenomic data[J]. PLOS Computational Biology, 2011, 7(1):e1001061. [4]Pruesse E, Quast C, Knittel K, et al. SILVA: A comprehensive online resource for quality checked and aligned ribosomal RNA sequence data compatible with ARB[J]. Nucleic Acids Res, 2007, 35:7188-7196. [5]Huse SM, Dethlefsen L, Huber JA,et al. Exploring microbial diversity and taxonomy using SSU rRNA hypervariable tag sequencing[J]. PLoS Genet, 2008, 4:e1000255. [6]Schloss PD, Westcott SL, Ryabin T, et al. Introducing mothur: open-source, platform-independent, community-supported software for describing and comparing microbial communities[J]. Appl Environ Microbial, 2009, 75(23):7537-7541. [7]Sun Y, Cai Y, Liu L, et al. ESPRIT: estimating species richness using large collections of 16S rRNA pyrosequences[J]. Nucleic Acids Res, 2009, 37(10):e76. [8]Cai Y, Sun Y. ESPRIT-Tree: hierarchical clustering analysis of millions of 16S rRNA pyrosequences in quasilinear computational time[J]. Nucleic Acids Res, 2011, 39(14):e95. [9]Li W, Godzik A. Cd-hit: a fast program for clustering and comparing large sets of protein or nucleotide sequences[J]. Bioinformatics,2006, 22(13): 1658-1659. [10]Edgar RC. Search and clustering orders of magnitude faster than BLAST[J]. Bioinformatics, 2010, 26(19):2460–2461. [11]Ghodsi M, Liu B, Pop M. DNACLUST: accurate and efficient clustering of phylogenetic marker genes[J]. BMC Bioinformatics, 2011, 12:1-11. [12]Hao X, Jiang R, Chen T. Clustering 16S rRNA for OTU prediction: a method of unsupervised Bayesian clustering[J]. Bioinformatics, 2011, 27: 611–618. [13]Huse SM, Huber JA, Morrison HG, et al. Accuracy and quality of massively parallel DNA pyrosequencing[J]. Genome Biology, 2007, 8(7): R143. [14]Lysholm F, Andersson B, Persson B. An efficient simulator of 454 data using configurable statistical models[J]. BMC Research Notes, 2011, 4(1):449.    

服务与反馈:
文章下载】【加入收藏
提示:您还未登录,请登录!点此登录
 
友情链接  
地址:北京安定门外安贞医院内北京生物医学工程编辑部
电话:010-64456508  传真:010-64456661
电子邮箱:llbl910219@126.com