设为首页 |  加入收藏
首页首页 期刊简介 消息通知 编委会 电子期刊 投稿须知 广告合作 联系我们
中文影像学报告中的命名实体识别研究

Study on named entity recognition in Chinese radiology reports

作者: 张志强  徐岩  黄艳群  王妮  杨正汉  陈卉  刘红蕾 
单位:首都医科大学生物医学工程学院(北京 100069) 首都医科大学临床生物力学应用基础研究北京重点实验室(北京 100069) 首都医科大学附属北京友谊医院放射科 (北京 100050)
关键词: 影像学报告;  自然语言处理;  条件随机场;  命名实体识别;  信息提取 
分类号:R318;TP31
出版年·卷·期(页码):2020·39·6(609-614)
摘要:

目的 探索对中文影像学报告进行命名实体识别的方法,特别是条件随机场算法的识别效果。方法 随机收集98份腹部CT影像学报告。与影像学专家共同确定报告中影像所见部分的5类实体部位、形态、大小、密度和增强,并进行人工标注。将98份报告按7:3的比例随机分为训练集样本和测试集样本,使用条件随机场中的三种特征模板进行命名实体识别,并比较识别结果。结果 98份CT影像学报告的影像所见共32332个汉字及字符,训练集19151字,测试集7418字。分别利用三种条件随机场特征模板时,实体的总体识别结果F1值平均0.9487,实体[大小]的识别的F1值最高达0.9818。结论 条件随机场算法在中文影像学报告的命名实体识别任务中具有很高的准确性,所识别的实体可用于进行后续信息提取等自然语言处理任务。

Objective To explore the method for the named entity recognition in Chinese radiology reports, especially the recognition performance using a conditional random field (CRF) algorithm. Methods We collected 98 abdominal CT radiology reports randomly. Five named entities, including [location], [shape], [size], [density], and [enhancement] were determined together with experienced radiologists. All reports were labeled manually. 98 radiology reports were divided randomly into the training set and test set by a ratio of 7:3. The recognition performances were compared among different feature templates used in the CRF algorithm. Results A total of 32332 Chinese characters and other characters, 19151 characters in the training set and 7418 characters in the test set, were seen in the part of the radiological finding of the study radiology reports. Three CRF feature templates were used respectively. The average F1-score for the entity recognition of all entities was 0.9487, and the F1-score (0.9818) for the entity [size] was the highest. Conclusions The accuracy of named entity recognition in Chinese radiology reports was high using the CRF algorithm. The recognized entities could be applied in information extraction or other tasks in natural language processing.

参考文献:

[1] 孟勋. 医疗信息化中的医院信息系统建设研究[J]. 中国卫生产业, 2016,13 (35):66-67. Meng X. Research on construction of hospital information system in the hospital information[J]. China Health Industry, 2016, 13 (35):66-67.
[2] 马锡坤, 杨国斌, 于京杰. 国内电子病历发展与应用现状分析[J]. 计算机应用与软件, 2015, 32 (1) : 10-12, 38. Ma XK, Yang GB, Yu JJ. Analysing the development and application status of electronic medical records in China[J]. Computer Applications and Software, 2015, 32 (1) : 10-12, 38.
[3] 聂莉莉, 李传富, 许晓倩, 等. 人工智能在医学诊断知识图谱构建中的应用研究[J]. 医学信息学杂志, 2018, 39(6): 7-12.
 Nie LL, Li CF, Xu XQ, et al. Study on application intelligence in the building of medical diagnosis knowledge graph[J]. Journal of Medical Intelligence, 2018, 39(6): 7-12.
[4] Liu Y, Zhu LN, Liu Q, et al. Automatic extraction of imaging observation and assessment categories from breast magnetic resonance imaging reports with natural language processing[J]. Chinese Medical Journal, 2019, 132(14): 1673-1680.
[5] 于楠. 中文电子病历信息抽取关键技术研究[D]. 北京: 北京工业大学, 2017.
 Yu N. Study on key technology of Chinese electronic medical records information extraction[D]. Beijing: Beijing University of Technology, 2017.
[6] 周昆. 基于规则的命名实体识别研究[D]. 合肥: 合肥工业大学, 2010.
 Zhou K. Research on named entity recognition based on rules[D]. Hefei: Hefei University of Technology, 2010.
[7] Lei J, Tang B, Lu X, et al. A comprehensive study of named entity recognition in Chinese clinical text[J]. Journal of the American Medical Informatics Association, 2014, 21(5) : 808-814.
[8] Chen Y, Lasko TA, Mei Q, et al. A study of active learning methods for named entity recognition in clinical text[J]. Journal of Biomedical Informatics, 2015, 58: 11-18.
[9] 曲春燕, 关毅, 杨锦锋, 等. 中文电子病历命名实体标注语料库构建[J]. 高技术通讯, 2015, 25(2): 143-150.
 Qu CY, Guan Y, Yang JF, et al. The construction of annotated corpora of named entities for Chinese electronic medical records[J]. High Technology Letters, 2015, 25(2): 143-150.
[10] 李航.统计学习方法[M]. 北京:清华大学出版社,2012: 194-198.
[11] Chen P, Liu Q, Wei L, et al. Automatically structuring on Chinese ultrasound report of cerebrovascular diseases via natural language processing[J]. IEEE Access, 2019,7: 89043-89050.
[12]  Hassanpour S, Langlotz CP. Information extraction from multi-institutional radiology reports[J]. Artificial Intelligence in Medicine, 2016, 66 : 29-39.
[13] Liu X, Zhou Y, Wang Z. Recognition and extraction of named entities in online medical diagnosis data based on a deep neural network[J]. Journal of Visual Communication and Image Representation, 2019, 60: 1-15.
[14] Huang Z, Xu W, Yu K. Bidirectional LSTM-CRF models for sequence tagging [EB/OL].[2019-11-30]. https://arxiv.org/pdf/1508.01991
[15] Souza F, Nogueira R, Lotufo R. Portuguese named entity recognition using BERT-CRF [EB/OL].[2019-11-30]. https://arxiv.org/pdf/1909.10649

服务与反馈:
文章下载】【加入收藏
提示:您还未登录,请登录!点此登录
 
友情链接  
地址:北京安定门外安贞医院内北京生物医学工程编辑部
电话:010-64456508  传真:010-64456661
电子邮箱:llbl910219@126.com