设为首页 |  加入收藏
首页首页 期刊简介 消息通知 编委会 电子期刊 投稿须知 广告合作 联系我们
基于半监督学习的患者相似性度量研究

Analysis of patient similarity measurement based on semi-supervised learning

作者: 王妮  黄艳群  刘红蕾  费晓璐  魏岚  赵相坤  陈卉 
单位:首都医科大学生物医学工程学院(北京100069) 临床生物力学应用基础研究北京市重点实验室(北京100069) 首都医科大学宣武医院信息中心(北京100053)
关键词: 半监督学习;  聚类分析;  患者相似性;  电子病历;  马氏距离 
分类号:R318;TP31
出版年·卷·期(页码):2020·39·2(152-157)
摘要:

目的 对数据类型多样的电子病历数据开展基于半监督学习的患者相似性度量研究,评估其可行性和有效性,并为后续个性化研究提供相似患者队列。方法 对来自真实世界的电子病历数据,首先特异性计算特征相似性(年龄、性别、疾病、实验室检查),结合专家标注的部分监督信息构成标签集,在标签集中有监督地学习出最优距离度量。然后计算标签集与无标签集数据间的马氏距离,对无标签集中的每个样本,找出与其距离最近的标签集样本,并将其相似性分值作为该无标签样本的患者相似性预测值。最后将学习出的患者相似性作为聚类时评估患者亲疏程度的指标,并与基于传统欧氏距离和余弦距离的聚类结果进行比较。结果 较欧氏距离和余弦距离,基于学习出的患者相似性的聚类结果中,患者相似程度更高,聚类效果更好。结论 对电子病历数据开展基于半监督学习的患者相似性度量研究是有效的。

Objective To analyze the validity and effect of patient similarity measurement based on semi-supervised learning on electronic medical records and to provide a similar cohort (“patients like me”) for personalized prediction. Methods Based on electronic medical record data, feature similarities (age, sex, disease, laboratory tests) were firstly calculated by using customized measurements. Certain paired feature similarities and their corresponding single similarity score from experts were combined as the label set, based on which the optimal distance measurement was learned by supervised learning. For each sample (i.e., paired similarities for age, sex, disease, and laboratory tests of two patients) of the unlabeled set, its potential similarity score was determined by its nearest neighbor based on the Mahalanobis distance. And then the learned patient similarity was applied to cluster as the closeness degree between patients. The clustering results based on traditional Euclidean distance and cosine distance were given as reference. Results Patients in each cluster based on semi-supervised learning were more similar than those clusters based on classical Euclidian distance and cosine distance. Conclusions It is effective to carry out a study on patient similarity measurement based on semi-supervised learning for electronic medical record data.

参考文献:

[1] Longhurst CA, Harrington RA, Shah NH. A 'green button' for using aggregate patient data at the point of care[J]. Health Affairs, 2014, 33(7): 1229-1235.

[2] Lee J, Maslove DM, Dubin JA. Personalized mortality prediction driven by electronic medical data and a patient similarity metric[J]. PLoS One, 2015, 10(5): e0127428.

[3] Ng K, Sun J, Hu J, et al. Personalized predictive modeling and risk factor identification using patient similarity[J]. AMIA Joint Summits on Translational Science Proceedings, 2015, 2015: 132-136.

[4] Li L, Cheng WY, Glicksberg BS, et al. Identification of type 2 diabetes subgroups through topological analysis of patient similarity[J]. Science Translational Medicine, 2015, 7(311): 311ra174.

[5] Sharafoddini A, Dubin JA, Lee J. Patient similarity in prediction models based on health data: a scoping review[J]. JMIR Medical Informatics, 2017, 5(1): e7.

[6] 刘建伟, 刘媛, 罗雄麟.半监督学习方法[J]. 计算机学报, 2015, 38(8): 1592-1617.

Liu JW, Liu Y, Luo XL. Semi-supervised learning methods[J]. Chinese Journal of Computers, 2015, 38(8): 1592-1617.

[7] 薛巍. 基于半监督学习的人脸特征抽取方法研究[D]. 扬州: 扬州大学, 2015.

Xue W. The research of facial feature extraction method based on semi-supervised learning[D]. Yangzhou: Yangzhou University, 2015.

[8] Wang N, Huang Y, Liu H, et al. Measurement and application of patient similarity in personalized predictive modeling based on electronic medical records[J]. BioMedical Engineering OnLine, 2019,18: 98.

[9] 黄艳群, 王妮, 张慧, 等. 利用患者相似性建立个性化糖尿病预测模型[J]. 医学信息学杂志, 2019, 40(1):54-58.

Huang YQ, Wang N, Zhang H, et al. Establishing the personalized diabetes prediction models by making use of patient similarity[J]. Journal of Medical Informatics, 2019, 40(1): 54-58.

[10] Wang F, Sun J, Li T, et al. Two Heads Better Than One: Metric+Active Learning and its Applications for IT Service Classification[C]// 2009 Ninth IEEE International Conference on Data Mining. Miami Beach, FL, USA: IEEE Press, 2009: 1022-1027.

[11] Jia Y, Nie F, Zhang C. Trace ratio problem revisited[J]. IEEE Transactions on Neural Networks, 2009, 20(4): 729-735.

[12] 李凯, 王兰. 层次聚类的簇集成方法研究[J]. 计算机工程与应用, 2010, 46(27): 120-123.

Li K, Wang L. Research on cluster ensembles methods based on hierarchical clustering[J]. Computer Engineering and Applications, 2010, 46(27): 120-123.

[13] Parimbelli E, Marini S, Sacchi L, et al. Patient similarity for precision medicine: a systematic review[J]. Joumal of Biomedical informatics, 2018, 83: 87-96.

[14] Gottlieb A, Stein GY, Ruppin E, et al. A method for inferring medical diagnoses from patient similarities[J]. BMC Medicine, 2013, 11: 194.

[15] Perlman L , Gottlieb A , Atias N , et al. Combining drug and gene similarity measures for drug-target elucidation[J]. Journal of Computational Biology, 2011, 18(2):133-145.

服务与反馈:
文章下载】【加入收藏
提示:您还未登录,请登录!点此登录
 
友情链接  
地址:北京安定门外安贞医院内北京生物医学工程编辑部
电话:010-64456508  传真:010-64456661
电子邮箱:llbl910219@126.com