设为首页 |  加入收藏
首页首页 期刊简介 消息通知 编委会 电子期刊 投稿须知 广告合作 联系我们
基于CatBoost算法的中青年颈动脉粥样硬化预测方法

Carotid arteriosclerosis prediction method based on CatBoost algorithm in young and middle ages

作者: 丁瑶  张小玉  许杨  高理升  孙怡宁  王世军  马祖长 
单位:中国科学院合肥智能机械研究所(合肥 230031);中国科学技术大学(合肥230026);大连医科大学(辽宁大连 116044)
关键词: 颈动脉粥样硬化;  特征选择;CatBoost;Logistic回归;  人工神经网络 
分类号:R318.04
出版年·卷·期(页码):2020·39·5(470-476)
摘要:

目的 探究CatBoost算法在中青年颈动脉粥样硬化预测中的应用价值,为中青年颈动脉粥样硬化早期筛查提供一种可行的技术手段。 方法 以2016 -2018年期间在北京某医院体检中心进行健康体检的2258位中青年为研究对象,根据颈动脉彩超检查结果诊断是否有颈动脉粥样硬化。使用下采样技术对样本进行平衡处理。分析变量重要性进行特征选择,构建CatBoost模型。利用Logistic回归和人工神经网络两类机器学习算法构建模型,并与CatBoost模型进行比较分析。以灵敏度、特异性、准确率及工作特征(receiver operating characteristic,ROC)曲线下的面积(area under the ROC curve, AUC)作为模型的评价指标。 结果 CatBoost模型在测试集上的灵敏度、特异性、准确率和AUC均最高,分别为82.8%,96.7%,90.3%,0.92。Logistic回归模型和神经网络模型的灵敏度、特异性和准确率均介于62.4%~73.3%之间,AUC均介于0.72~0.78之间。重要性分析表明影响中青年颈动脉粥样硬化最重要的三个因素依次是年龄、腰高比、高密度脂蛋白胆固醇。 结论 CatBoost算法在中青年颈动脉粥样硬化预测中的应用具有一定的可行性。相比于其他传统算法,具有较高的诊断价值。

Objective To explore the application value of CatBoost algorithm in the prediction of carotid atherosclerosis in young and middle-aged people and provide a feasible technical means for early screening of carotid arteriosclerosis in young and middle-aged people. Methods A total of 2258 young and middle-aged people who underwent a health checkup at a medical checkup center in a Beijing hospital from 2016 to 2018 were selected as the research subjects, carotid arteriosclerosis was diagnosed based on the results of carotid color doppler ultrasound. Samples were balanced using under-sampling techniques. Feature selection is performed by analyzing the importance of variables. The CatBoost prediction model was built. In addition, models were constructed using two types of machine learning algorithms, Logistic regression and artificial neural network, and compared with CatBoost model. Sensitivity, specificity, accuracy, and the ROC curve areas (AUC) were used as the evaluation indicators of the model. Results The CatBoost model had the highest sensitivity, specificity, accuracy, and AUC on the test set, which were 82.8%, 96.7%, 90.3%, and 0.92 respectively. The sensitivity, specificity and accuracy of the models constructed by Logistic regression and neural network were between 62.4% and 73.3%, and the AUCs were between 0.72 and 0.78.Importance analysis showed that the three most important factors affecting carotid arteriosclerosis in young and middle-aged people were age, waist-to-height ratio, and high-density lipoprotein cholesterol. Conclusions The CatBoost algorithm is feasible in the prediction of carotid sclerosis in young and middle-aged people. Compared with other traditional algorithms, it has higher diagnostic value.

参考文献:

[1] Timmis A, Townsend N, Gale C, et al. European society of cardiology: cardiovascular disease statistics 2017[J]. European Heart Journal, 2017, 39(7): 508-579.

[2] 胡盛涛, 高润霖, 刘力生, 等. 《中国心血管病报告2018》概要[J]. 中国循环杂志, 2019, 34(3): 6-17.

    Hu SS, Gao RL, Liu LS, et al. Summary of the 2018 report on cardiovascular diseases in China[J]. Chin Circul J, 2019, 34(3): 6-17.

[3] 中国心血管病预防指南(2017)写作组, 中华心血管病杂志编辑委员会. 中国心血管病预防指南(2017)[J]. 中华心血管病杂志, 2018, 46(1): 10-25.

[4] 邓木兰, 李河, 石美玲, 等. 广州市番禺区农民急性冠心病事件发病率及20年变化趋势[J]. 中华心血管病杂志, 2014, 42(3): 236-240.

    Deng ML, Li H, Shi ML, et al. Prevalence of acute coronary heart disease among farmers in Panyu, Guangzho: a 20-year population-based study[J]. Chin J Cardiol, 2014, 42(3): 236-240

[5] Wang WZ, Jiang B, Sun HX, et al. Prevalence, incidence and mortality of stroke in china: results from a nationwide population-based survey of 480,687 Adults. [J]. Circulation, 2017, 135(8): 759.

[6] 田进伟, 符亚红. 动脉粥样硬化易损斑块快速进展机制与临床治疗进展[J]. 中国动脉硬化杂志, 2019(4): 277-280.

    Tian JW, Fu YH. The mechanism of progression and clinical intervention of atherosclerotic vulnerable plaque[J]. Chinese Journal of Arteriosclerosis, 2019(4): 277-280.

[7] Pang HY, Ye YC, Ding FM, et al. Risk factors for progression of carotid intima-media thickness in patients with systemic lupus erythematosus: protocol for an observational cohort study in China[J]. BMJ Open, 2019, 9(9): e030721  .

[8] 张萌, 郑慧, 张敏, 等 . 颈动脉不稳定型斑块、血脂、血压与急性脑梗死关系的病例对照研究[J]. 中华疾病控制杂志, 2016, 20(8): 831-834.

Zhang M, Zheng H, Zhang M, et al. Case-control study on association of carotid artery unstable carotid plaque, blood lipid and blood pressure with acute cerebral infarction[J]. Chinese Journal of Disease Control & Prevention, 2016, 20(8): 831-834.

[9] 童璐莎, 姜雯红, 严慎强, 等. 基于社区抽样调查数据的颈动脉疾病预测模型[J]. 中华急诊医学杂志, 2014, 4(23): 801-805.

    Tong LS, Jiang WH, Yan SQ, et al. The predictive model of carotid angiopathy set from randomly sampled community data[J]. Chinese Journal of Emergency Medicine, 2014, 4(23): 801-805.

[10] 王琪, 李娟生, 蒲宏全, 等. 某随访人群颈动脉粥样硬化发生影响因素及风险预测能力研究[J].中华疾病控制杂志, 2019, 23(04): 382-386.

    Wang Q, Li JS, Pu HQ, et al. Influence factors and predictive ability of a risk prediction model for carotid atherosclerosis in a follow-up population[J]. Chinese Journal of Disease Control & Prevention, 2019, 23(04): 382-386.

[11] 牟冬梅, 任珂. 三种数据挖掘算法在电子病历知识发现中的比较[J]. 现代图书情报技术, 2016, (6): 102-109.

[12] Zhang MH, Zhang X, Guo X, et al. Prognostic factors of breast cancer with machine learning method based on SEER database[J]. Beijing Biomedical Engineering, 2019, 38(5): 486-491,497.

[13] 苏萍, 杨亚超, 杨洋, 等. 健康管理人群2型糖尿病发病风险预测模型[J].山东大学学报(医学版), 2017, 55(6): 82-86.

     Su P, Yang YC, Yang Y, et al. Prediction models on the onset risks of type 2 diabetes among the health management population[J]. Journal of Shandong University(Health Sciences), 2017, 55(6): 82-86.

[14] 尤晓东, 苏崇宇, 汪毓铎. BP神经网络算法改进综述[J]. 民营科技, 2018 (4): 152-153.

[15] 严若华, 李卫, 谷鸿秋,等. Cox比例风险回归模型C统计量的计算方法及其SAS实现[J]. 中华疾病控制杂志, 2016, 20(9): 953-956,961.

     Yan RH, Li W, Gu HQ, et al. Calculation of C statistics for the Cox proportional hazards regression models and its implementation in SAS[J]. Chinese Journal of Disease Control & Prevention, 2016, 20(9): 953-956,961.

[16] 马晓梅, 徐学琴, 闫国立, 等. BP神经网络和决策树分析在重症手足口病临床早期预警指标中的应用[J]. 中国卫生统计, 2019, 36(3): 381-383.

[17] 徐继伟, 杨云. 集成学习方法:研究综述[J]. 云南大学学报(自然科学版), 2018, 40(6): 36-46.

     Xu JW, Yang Y. A survey of ensemble learning approaches[J]. Journal of Yunnan University(Natural Science), 2018, 40(6): 36-46.

[18] 苗丰顺, 李岩, 高岑, 等. 基于CatBoost算法的糖尿病预测方法[J]. 计算机系统应用, 2019, 28(9): 215-218.

     Miao FS, Li Y, Gao C, et al. Diabetes Prediction Method Based on CatBoost Algorithm[J]. Computer Systems & Applications, 2019, 28(9): 215-218.

[19] 王斌, 冯慧芬, 王芳, 等. 基于机器学习的Cat Boost模型在预测重症手足口病中的应用[J]. 中国感染控制杂志, 2019, 18(1): 12-16.

     Wang B, Feng HF, Wang F, et al. Application of CatBoost model based on machine learning in predicting severe hand- foot-mouth disease[J]. Chinese Journal of Infection Control, 2019, 18(1): 12-16.

 [20] Pan XF, Lai YX, Gu JQ, et al. Factors significantly associated with the increased prevalence of carotid atherosclerosis in a northeast chinese middle-aged and elderly population[J]. Medicine, 2016, 95(14): e3253.

[21] 钟金鹏. 基于实验室指标的颈动脉粥样硬化模型的建立与评价[D]. 重庆:重庆医科大学, 2011.

     Zhong JP. Establishment and evaluation of the predictive model for carotid arteriosclerosis based on laboratorial parameters[D]. Chongqing:Chongqing Medical University, 2011.

 [22] Sun Z. Aging, arterial stiffness, and hypertension[J].  Hypertension, 2015, 65(2): 252-256.

[23] Zhang ZQ, He LP, Xie XY, et al. Association of simple anthropometric indices and body fat with early atherosclerosis and lipid profiles in chinese adults[J]. Plos One, 2014, 9(8):e104361..

[24] Lee HJ, Hwang SY, Hong HC, et al. Waist-to-hip ratio is better at predicting subclinical atherosclerosis than body mass index and waist circumference in postmenopausal women[J]. Maturitas, 2015, 80(3): 323-328.

[25] Ge WZ, Faruque P, Fen W, et al. Association between anthropometric measures of obesity and subclinical atherosclerosis in Bangladesh[J]. Atherosclerosis, 2014, 232(1): 234-241.

[26] Nakajima H, Momose T, Misawa T. Prevalence and risk factors of subclinical coronary artery disease in patients undergoing carotid endarterectomy: a retrospective cohort study[J]. International Angiology, 2019, 38(4): 312-319.

[27] 刘蕾, 姜涛. 高密度脂蛋白胆固醇和高密度脂蛋白颗粒与颈动脉粥样硬化发生及严重程度的相关性[J].岭南心血管病杂志, 2017, 23(6): 673-676.

     Liu L, Jiang T. Correlations between HDL-C,HDL-P with the incidence and severity of carotid arterial atherosclerosis[J]. South China Journal of Cardiovascular Diseases, 2017, 23(6): 673-676.

服务与反馈:
文章下载】【加入收藏
提示:您还未登录,请登录!点此登录
 
友情链接  
地址:北京安定门外安贞医院内北京生物医学工程编辑部
电话:010-64456508  传真:010-64456661
电子邮箱:llbl910219@126.com