北京生物医学工程

基于上下文特征的领域文献实体消歧算法

Entity disambiguation algorithm for domain document based on context feature

作者：王静谭绍峰贺东东陈建辉闫健卓

单位：北京工业大学信息学部（北京 100124）<p>首都医科大学附属北京友谊医院平谷医院（北京 101200）</p>

关键词：实体消歧; 上下文特征; 概率模型

分类号：R318

出版年·卷·期（页码）：2018·37·4（398-402）

摘要：

目的面向生物医学领域基于文献的知识学习及应用需求, 为解决实体识别中存在的词语歧义问题, 提出一种基于上下文特征的实体消歧算法。方法实体消歧通常分为候选生成和实体消歧两部分。在候选生成阶段, 本文采用基于知识库的方法对实体指称生成候选, 并根据实体在知识库中的先验概率对候选实体进行筛选, 这样保证了目标实体的召回率并有效减少消歧阶段的计算复杂度和噪声。在实体消歧阶段, 本文提出一种基于上下文特征的实体消歧方法, 构建概率模型计算实体上下文和实体指称上下文之间的相似度, 选取相似度最大的实体作为目标实体。对从文献中识别出的命名指称做实体消歧实验, 通过领域专家判断实体消歧结果的正确性, 比较在不同算法下实体消歧的准确率。结果本文提出的方法在所选择的数据集中获得了83%的实体消歧准确率, 高于其他算法。结论基于上下文特征的实体消歧算法在本领域的实体消歧工作中效果最佳。

Objective Based on the requirements of knowledge learning and application in the domain of biomedical, a kind of entity disambiguation algorithm is proposed to solve the problem of word ambiguity in entity recognition. Methods Entity disambiguation is usually divided into two parts: candidate generation and entity disambiguation. In this paper, candidates of name mention are generated based on the knowledge base method, and candidate entities are filtered based on the prior probability in the knowledge base of the candidate entity, which ensures the recall rate of the candidate entity set and the noise reduction in the disambiguation stage effectively. In the stage of entity disambiguation, we propose a disambiguation method based on the contextual characteristics of the entity, construct probabilistic model to compute the similarity between entity context and entity reference context, and select the largest similarity entity as the target entity. Then, we conduct entity disambiguation experiments for name mentions which are recognized from the literature, and determine the correctness of entity disambiguation by domain experts. Finally, we compare the accuracy of entity disambiguation under different algorithms. Results The accuracy of the proposed method is 83% with our dataset, which is higher than that of other algorithms. Conclusions The entity disambiguation algorithm based on context features is the best in the field of entity disambiguation.

参考文献：

服务与反馈：

【文章下载】【加入收藏】

提示：您还未登录，请登录！点此登录