设为首页 |  加入收藏
首页首页 期刊简介 消息通知 编委会 电子期刊 投稿须知 广告合作 联系我们
SMOTE算法在不平衡数据中的应用

Application of SMOTE arithmetic for unbalanced data

作者: 孙涛  吴海丰  梁志刚  贺文  张镭  吕平欣  郭秀花 
单位:首都医科大学公共卫生与家庭医学学院(北京100069)
关键词: SMOTE;不平衡数据;临床数据 
分类号:
出版年·卷·期(页码):2012·31·5(528-530)
摘要:

目的 临床数据在分析时多存在不平衡性,即阳性数据和阴性数据不相等,如果不加以预处理会使分析结果产生偏倚。处理有偏性数据的方法多,但多数方法存在过拟合或丢失数据等缺点。方法 本文介绍了SMOTE算法的原理和R语言具体实现方式,并用SMOTE算法处理真实临床数据作为应用实例。结果 原始数据良恶性比率为1/3,经过SMOTE算法处理后,良恶性比率为1。结论 SMOTE算法可对不平衡数据进行有效纠偏。

Objective Unbalanced data which means inequality between positive and negative data, is a common problem in clinical data analysis, and this problem may result in bias. Methods for balancing data are various, yet some may over fit or lose data. Methods In this paper, SMOTE arithmetic and the application in R language were introduced briefly and we used SMOTE arithmetic for real unbalanced data. Results The ratio between benign and malignant cases was 1/3 in original data and the ratio was 1 in balanced data. Conclusions The SMOTE arithmetic has good performance in balancing data.

参考文献:

[1]Wang H, Guo XH, Jia ZW et al. Multilevel binomial logistic prediction model for malignant pulmonary nodules based on texture features of CT image[J]. European Journal of Radiology, 2010, 74: 124-129.
[2]Guo XH, Sun Tao, Wu HF, et al. Support Vector Machine Prediction Model of Early-stage Lung Cancer Based on Curvelet Transform to Extract Texture Features of CT[J]. World Academy of Science, Engineering and Technology, 2010,71:  333-337.
[3]Francisco FN, Cesar HM, Pedro AG. A dynamic over-sampling procedure based on sensitivity or multi-class problems[J]. Pattern Recognition, 2011, 44: 1821-1833.
[4]Alberto F, María J, Francisco H. On the influence of an adaptive inference system in fuzzy rule based classification systems for imbalanced data-sets[J]. Expert Systems with Applications, 2009, 36: 9805-9812.
[5]Chawla NV, Bowyer KW, Hall LO, et al. Smote: synthetic minority over-sampling technique[J], Journal of Artificial Intelligence Research, 2002,16: 321-357.
 

服务与反馈:
文章下载】【加入收藏
提示:您还未登录,请登录!点此登录
 
友情链接  
地址:北京安定门外安贞医院内北京生物医学工程编辑部
电话:010-64456508  传真:010-64456661
电子邮箱:llbl910219@126.com