设为首页 |  加入收藏
首页首页 期刊简介 消息通知 编委会 电子期刊 投稿须知 广告合作 联系我们
NetRD: 一种利用Bing搜索结果补充文献挖掘证据集的工具

NetRD: a tool to supplement evidence sentence set by literature mining with Bing data

作者: 邹熊峰  郑浩然 
单位:中国科学技术大学计算机科学与技术学院(合肥 230027)
关键词: 生物文献挖掘;  Bing  web  search  API;  证据集;  生物实体关联对;  证据补充 
分类号:R318
出版年·卷·期(页码):2019·38·4(377-383)
摘要:

目的 当前生物文献挖掘工作的重心是改进各挖掘模块性能,以提升挖掘结果的可信度,但有很大比例的挖掘结果其文献证据很少,为此本文提出一个利用Bing搜索引擎从海量web数据中为文献挖掘得到的生物实体关联对提供补充证据的工具系统。方法 利用现有文本挖掘技术从PubMed文献中挖掘一批生物实体关联对,引入Bing web搜索模块,以生物实体名作为关键词从web中利用Bing开放搜索API得到一批搜索结果,将这些结果整理成新的数据源,最终从该新的数据源中挖掘得到一批来自web的补充证据。结果 本系统(http://bioinfo.ustc.edu.cn/NetRD)对文献证据较少的生物实体关联对提供了有效的补充证据支持,丰富了文献挖掘结果最终的证据集。结论 以web数据作为补充数据源,能够有效地为文献证据很少的生物实体对提供证据补充,为相关研究者确认两个生物实体之间的关联提供重要参考。

Objective The current focus of biological literature mining is to improve the performance of each mining module to enhance the confidence of mining results. However, there are a large proportion of results having few evidence sentences from literature. To alleviate this problem, we propose a tool system which uses Bing search engine to provide additional evidence for the association of biological entities obtained from massive amounts of web data. Methods Firstly, existing bio-literature mining tools are applied to mine a batch of associations between bio-entities. Then, by applying Bing web search API to text mining system, we use biomedical entities as keywords to search from web and fetch the returned results. These results are then collected as another data source. Finally, we mine associations between biomedical entities from the new data source and collect a considerable amount of supplemental evidence sentences from web. Results NetRD (http://bioinfo.ustc.edu.cn/NetRD) provides an effective supplemental evidence support for disease-related genes that have few evidences from literature, and enriches the final set of evidence sentences for literature mining. Conclusions Using Bing search results, NetRD can effectively provide supplemental evidence support for associations between bio-entities with few evidence sentences mined from literature, which is of great reference value for the relevant researchers to confirm whether a bio-entity is associated with another bio-entity.

参考文献:

[1]        Liu Y, Liang Y, Wishart D. PolySearch2: a significantly improved text-mining system for discovering associations between human diseases, genes, drugs, metabolites, toxins and more[J]. Nucleic Acids Research, 2015, 43: W535-W542.

[2]        Pletscher-Frankild S, Pallejà A, Tsafou K, et al. Diseases: text mining and data integration of disease–gene associations[J]. Methods, 2015, 74:83-89.

[3]        Kim J, Kim J, Lee H. Corrigendum: An analysis of disease-gene relationship from Medline abstracts by DigSee[J]. Scientific Reports, 2017, 7: 40154.

[4]        Xu D, Zhang M, Xie Y, et al. DTMiner: identification of potential disease targets through biomedical literature mining[J]. Bioinformatics, 2016, 32(23):3619-3626.

[5]        Kim J, Kim H, Yoon Y, et al. LGscore: a method to identify disease-related genes using biological literature and Google data[J]. Journal of Biomedical Informatics, 2015, 54:270-282.

[6]        Yildirim MA, Goh KI, Cusick ME, et al. Drug-target network[J]. Nature Biotechnology, 2007, 25(10): 1119-1126.

[7]        Chen X, Yan CC, Zhang X, et al. Drug–target interaction prediction: databases, web servers and computational models[J]. Briefings in Bioinformatics, 2016, 17(4): 696-712.

[8]        Yang H, Swaminathan R, Sharma A, et al. Mining biomedical text towards building a quantitative food-disease-gene network[M]// Learning Structure and Schemas from Documents. Berlin: Springer-Verlag Berlin Heidelberg, 2011:205-225.

[9]        Chen H, Sharp BM. Content-rich biological network constructed by mining PubMed abstracts[J]. BMC Bioinformatics, 2004, 5(1):1-13.

[10]      https://en.wikipedia.org/wiki/Bing_(search_engine)

[11]      Campos D, Matos S, Oliveira JL. Gimli: open source and high-performance biomedical name recognition[J]. BMC Bioinformatics, 2013, 14:54.

[12]      Liu H, Hu ZZ, Zhang J, et al. BioThesaurus: a web-based thesaurus of protein and gene names[J]. Bioinformatics, 2006, 22(1):103-105.

[13]      Wei CH, Kao HY, Lu Z. GNormPlus: an integrative approach for tagging genes, gene families, and protein domains[J]. Biomed Research International, 2015, 2015:918710.

[14]      Song M, Kim WC, Lee D, et al. PKDE4J: Entity and relation extraction for public knowledge discovery[J]. Journal of Biomedical Informatics, 2015, 57:320-332.

[15]      Becker KG, Barnes KC, Bright TJ, et al. The genetic association database[J]. Nature Genetics, 2004, 36(5):431-432.

[16]      Bravo à, Pi?ero J, Queralt-Rosinach N, et al. Extraction of relations between genes and diseases from text and large-scale data analysis: implications for translational research[J]. BMC Bioinformatics, 2015, 16:55.

[17]      Bertram L, McQueen MB, Mullin K, et al. Systematic meta-analyses of Alzheimer disease genetic association studies: the AlzGene database[J]. Nature Genetics, 2007, 39(1):17-23.

服务与反馈:
文章下载】【加入收藏
提示:您还未登录,请登录!点此登录
 
友情链接  
地址:北京安定门外安贞医院内北京生物医学工程编辑部
电话:010-64456508  传真:010-64456661
电子邮箱:llbl910219@126.com