设为首页 |  加入收藏
首页首页 期刊简介 消息通知 编委会 电子期刊 投稿须知 广告合作 联系我们
基于文本挖掘的流行病学致病因素的提取_________

Extraction of epidemiologic risk factors based on text mining

作者:               卢延鑫  姚旭峰          
单位:           中国疾病预防控制中心寄生虫病预防控制所,卫生部寄生虫病原与媒介生物学重点实验室,世界卫生组织疟疾、血吸虫病和丝虫病合作中心(上海200025)    
关键词:           文本挖掘;致病因素;信息提取;流行病学      
分类号:
出版年·卷·期(页码):2013·32·2(160-163)
摘要:

目的 基于文本挖掘技术,设计出能够自动提取流行病学致病因素的系统。方法 该自动信息提取系统由一个文本挖掘引擎子系统和一个基于规则的信息提取子系统构成。首先使用文本挖掘引擎标记出所有的名词短语,并收集该名词短语的语义等信息。然后利用基于规则的文本分类器,标记出流行病学致病因素。结果 为评估本系统,将由流行病学专家人工注解的文本输入该系统,评估发现最好的结果F-measure为64.6%,其精确率和召回率分别为61.0%和68.8%,该结果优于其它相关研究,且其中有些错误仍可避免。结论 基于文本挖掘的方法对从流行病学研究文献中自动提取致病因素信息有很大帮助。

Objective Based on text mining techniques,we design a system which automatically extracts epidemiologic risk factors. Methods The system consists of a text mining engine subsystem and a rule-based information extraction subsystem. First,all the noun phrases are identified by the text mining engine subsystem and the information are collected. Then,the epidemiologic risk factors are identified by the text classifier system based on rules. Results The evaluation of the system using text annotated by an epidemiologist shows the highest F-measure of 64.6%(Precision 61.0% and Recall 68.8%),with certain avoidable mistakes. Conclusions This method is helpful for the automatic extraction of risk factors in the epidemiologic literatures.

参考文献:

[1]Larsson SC,Orsini N,Wolk A. Vitamin B6 and risk of colorectal cancer:a meta-analysis of prospective studies[J]. JAMA,2010,303(11):1077-1083.
[2]Mosca L,Appel LJ,Benjamin EJ, et al. Evidence-based guidelines for cardiovascular disease prevention in women[J]. Circulation,2004,109(5):672-693.
[3]Dietary Guideline for Americans [EB/OL]. [2012-03-20]. http://www.health.gov/dietaryguidelines.
[4]Centers for Disease Control and Prevention [EB/OL]. [2012-03-20].  http://www.cdc.gov/DiseasesConditions.
[5]Rindflesch TC,Tanabe L,Weinstein JN, et al. EDGAR:extraction of drugs,genes and relations from the biomedical literature[J]. Pac Symp Biocomput,2000:517-528.
[6]Cohen AM,Hersh WR. A survey of current work in biomedical text mining. Brief Bioinform,2005,6(1):57-71.
[7]Yu H,Hatzivassiloglou V,Friedman C,et al. Automatic extraction of gene and protein synonyms from MEDLINE and journal articles[J]. Proc AMIA Symp,2002:919-923.
[8]Chen ES,Hripcsak G,Xu H,et al. Automated acquisition of disease drug knowledge from biomedical and clinical documents:an initial study[J]. J Am Med Inform Assoc,2008, 15(1):87-98.
[9]Church K,Gale W,Hanks P,et al. Using statistics in lexical analysis[M]. //Hillsdale ZU,Lexical Acquisition:Exploiting on-line ressources to build a lexicon. NJ:Lawrence Erlbaum Associates,1991.

[10]Basili R,Pazienza M,Zanzotto F. Modeling the syntactic contextual information for term extraction[C]. Bulgaria:Conference on Recent Advances in Natural Lanugage Processing,2001.
[11]Frantzi K,Ananiadou S,Mima H. Automatic recognition of multi-word terms[J]. The C-value/NC-value Method International Journal on Digital Libraries,2000,3:115-130.
[12]Krauthammer M,Nenadic G. Term identification in the biomedical literature[J]. J Biomed Inform,2004,37(6):512-526.
[13]Zeng QT,Tse T,Divita G,et al. Term identification methods for consumer health vocabulary development[J]. J Med Internet Res,2007,9(1):e4.
[14]Harris MR,Savova GK,Johnson TM,et al. A term extraction tool for expanding content in the domain of functioning,disability,and health:proof of concept[J]. J Biomed Inform,2003,36(4-5):250-259.
[15]Rindflesch TC,Hunter L,Aronson AR. Mining molecular binding terminology from biomedical text[J]. Proc AMIA Symp Proc,1999:127-131.
[16]Fiszman M,Rosemblat G,Ahlers CB,et al. Identifying risk factors for metabolic syndrome in biomedical text[J]. AMIA Annu Symp Proc,2007:249-253.

服务与反馈:
文章下载】【加入收藏
提示:您还未登录,请登录!点此登录
 
友情链接  
地址:北京安定门外安贞医院内北京生物医学工程编辑部
电话:010-64456508  传真:010-64456661
电子邮箱:llbl910219@126.com