物联网学报 ›› 2018, Vol. 2 ›› Issue (2): 65-72.doi: 10.11959/j.issn.2096-3750.2018.00055
张雪,石志国,刘璇
修回日期:
2018-05-15
出版日期:
2018-06-01
发布日期:
2018-07-03
作者简介:
张雪(1995-),女,北京科技大学硕士生,主要研究方向为医疗数据分析、算法设计与分析。|石志国(1978-),男,博士,北京科技大学教授,主要研究方向为智能系统与物联网技术。|刘璇(1993-),女,北京科技大学硕士生,主要研究方向为医疗数据分析、算法设计与分析。
基金资助:
Xue ZHANG,Zhiguo SHI,Xuan LIU
Revised:
2018-05-15
Online:
2018-06-01
Published:
2018-07-03
Supported by:
摘要:
传统的不平衡数据分类问题往往会因为类间数据不平衡造成分类器的性能下降。利用 AUC(ROC 曲线下的面积)为评价指标,结合单类 F-score 特征选择和遗传算法建立多层神经网络模型,选出对于不平衡数据分类更有利的特征子集,从而建立更适用于不平衡数据分类的深度模型。基于Tensor Flow建立多层神经网络模型,通过对4组不同UCI数据集进行测试,并与传统的机器学习算法如朴素贝叶斯、K最近邻、神经网络等进行对比验证。实验证明,所提模型在处理不平衡数据分类问题上的表现更优秀。
中图分类号:
张雪,石志国,刘璇. 面向不平衡数据的多层神经网络模型[J]. 物联网学报, 2018, 2(2): 65-72.
Xue ZHANG,Zhiguo SHI,Xuan LIU. Multilayer neural network model for unbalanced data[J]. Chinese Journal on Internet of Things, 2018, 2(2): 65-72.
[1] | CHAWLA N V , JAPKOWICZ N , KOTCZ A . Editorial:special issue on learning from imbalanced data sets[J]. ACM Sigkdd Explorations Newsletter, 2004,6(1): 1-6. |
[2] | EZAWA K J , SINGH M , NORTON S W . Learning goal oriented Bayesian networks for telecommunications risk management[C]// Thirteenth International Conference on International Conference on Machine Learning. 1996: 139-147. |
[3] | BATISTA G E A P A , PRATI R C , MONARD M C . A study of the behavior of several methods for balancing machine learning training data[J]. ACM Sigkdd Explorations Newsletter, 2004,6(1): 20-29. |
[4] | JAPKOWICZ N , STEPHEN S . The class imbalance problem:a systematic study[M]. Amsterdam: IOS PressPress, 2002. |
[5] | WEISS G M . Mining with rarity:a unifying framework[J]. ACM Sigkdd Explorations Newsletter, 2004,6(1): 7-19. |
[6] | AKBANI R , KWEK S , JAPKOWICZ N . Applying support vector machines to imbalanced datasets[J]. Lecture Notes in Computer Science, 2001,3201: 39-50. |
[7] | RASKUTTI , BHAVANI , KOWALCZYK .Extreme re-balancing for SVMs:a case study[J]. ACM Sigkdd Explorations Newsletter, 2004,6(1): 60-69. |
[8] | WU G , CHANG E Y . Class-boundary alignment for imbalanced dataset learning[J]. ICML Workshop on Learning from Imbalanced Data Sets, 2003: 49-56. |
[9] | ZHANG J , MANI I . KNN approach to unbalanced data distributions:a case study involving information extraction[C]// The ICML 2003 Workshop on Learning from Imbalanced Datasets. 2003. |
[10] | PATCHA A , PARK J M . An overview of anomaly detection techniques:existing solutions and latest technological trends[J]. Computer Networks, 2007,51(12): 3448-3470. |
[11] | FAWCETT T , PROVOST F . Adaptive fraud detection[J]. Data Mining& Knowledge Discovery, 1997,1(3): 291-316. |
[12] | CARDIE C , NOWE N . Improving minority class prediction using case-specific feature weights[C]// Fourteenth International Conference on Machine Learning. 1997: 57-65. |
[13] | BLAKE C . UCI repository of machine learning databases[J]. Department of Information and Computer Science, 1998. |
[14] | MALOOF M A . Learning when data sets are imbalanced and when costs are unequal and unknown[J]. ICML-2003 Workshop on Learning from Imbalanced Data Sets II, 2003. |
[15] | KUBAT M , MATWIN S . Addressing the curse of imbalanced training sets:one-sided selection[C]// International Conference on Machine Learning. 2012: 179-186. |
[16] | CHAWLA N V , BOWYER K W , HALL L O ,et al. SMOTE:synthetic minority over-sampling technique[J]. Journal of Artificial Intelligence Research, 2002,16(1): 321-357. |
[17] | JOSHI M V , KUMAR V , AGARWAL R C . Evaluating boosting algorithms to classify rare classes:comparison and improvements[C]// IEEE International Conference on Data Mining. 2001: 257-264. |
[18] | 王和勇, 樊泓坤, 姚正安 ,等. 不平衡数据集的分 类方法研究[J]. 计算机应用研究, 2008,25(5): 1301-1303. |
WANG H Y , FAN H K , YAO Z A ,et al. Research on the classification method of unbalanced dataset[J]. Computer Application Research, 2008,25(5): 1301-1303. | |
[19] | LEE M C . Using support vector machine with a hybrid feature selection method to the stock trend prediction[J]. Expert Systems with Applications, 2009,36(8): 10896-10904. |
[20] | MALDONADO S , WEBER R . A wrapper method for feature selection using support vector machines[J]. Information Sciences, 2008,179(13): 2208-2217. |
[21] | LIU Y , ZHENG Y F . FS_SFS:a novel feature selection method for support vector machines[J]. IEEE International Conference on Acoustics, 2006,39(7): 1333-1345. |
[22] | RAMARAJ N , RAMARAJ N . A hybrid prediction model with F-score feature selection for type II Diabetes databases[C]// Amrita ACM-W Celebration on Women in Computing in India. 2010:13. |
[23] | LIN X , WEI H , WANG F ,et al. A breast cancer risk classification model based on the features selected by novel f-score index for the imbalanced multi-feature dataset[C]// International Conference on Cyber-Enabled Distributed Computing and Knowledge Discovery. 2017. |
[24] | HOLLAND J H . Adaption in natural and artificial systems[J]. Quarterly Review of Biology, 1975,6(2): 126-137. |
[25] | HINTON G E , SALAKHUTDINOV R R . Reducing the dimensionality of data with neural networks[J]. Science, 2006,313: 504-507. |
[26] | HINTON G E , OSINDERO S , TEH Y W . A fast learning algorithm for deep belief nets[J]. Neural Computation, 2006,18(7): 1527-1554. |
[27] | 沈崇圣 . 遗传算法中常用选择算子在 MATLAB 中的实现[J]. 上海应用技术学院学报(自然科学版), 2003,3(3): 199-202. |
SHEN C S . The implementation of commonly used selection operators in MATLAB in genetic algorithm[J]. Journal of Shanghai Institute of Technology (Natural Science Edition), 2003,3(3): 199-202. | |
[28] | 林晓丽 . 复杂高维医学数据挖掘与疾病风险分类研究[D]. 北京:北京科技大学, 2016. |
LIN X L . Research on complex high-dimensional medical data mining and disease risk classification[D]. Beijing:University of Science and Technology Beijing, 2016. |
[1] | 耿光磊, 高博, 熊轲, 樊平毅, 陆杨, 王煜炜. 联邦学习赋能6G网络综述[J]. 物联网学报, 2023, 7(2): 50-66. |
[2] | 胡超, 鲁邦彦, 杨彦兵, 陈哲, 张磊, 陈良银. 基于低成本物联网芯片ESP32的人体行为识别系统[J]. 物联网学报, 2023, 7(2): 133-142. |
[3] | 王志宏, 冷甦鹏, 熊凯. 面向无人机集群协同感知的多智能体资源分配策略[J]. 物联网学报, 2023, 7(1): 18-26. |
[4] | 卫浓钰, 江子龙, 陈芳炯. 基于位置信息和能量均衡的声电协同网络AODV[J]. 物联网学报, 2023, 7(1): 27-36. |
[5] | 廖岑卉珊, 陈俊彦, 梁观平, 谢小兰, 卢小烨. 基于深度强化学习的SDN服务质量智能优化算法[J]. 物联网学报, 2023, 7(1): 73-82. |
[6] | 汤蓓, 王倩, 陈思光. 融合射频能量采集的协同节能计算迁移研究[J]. 物联网学报, 2023, 7(1): 83-92. |
[7] | 刘耀, 何岳园, 周红静, 李超良, 李闯. 移动边缘计算中基于资源联合分配的部分计算卸载方法[J]. 物联网学报, 2023, 7(1): 140-148. |
[8] | 韩文璇, 朱海龙, 何欣欣, 李妍珏, 尹长川. 一种结合入队整形的TSN流量调度算法[J]. 物联网学报, 2022, 6(4): 117-127. |
[9] | 张欢欢, 周安福, 马华东. 基于强化学习的实时视频流控与移动终端训练方法研究[J]. 物联网学报, 2022, 6(4): 1-13. |
[10] | 郭英芸, 高博, 张志飞, 张煜, 熊轲. 一种基于带宽分配的联邦学习激励机制[J]. 物联网学报, 2022, 6(4): 82-92. |
[11] | 邢方圆, 贺诗波, 孙铭阳, 陈积明. 基于“云-管-边-端”物联网架构的碳排放监测[J]. 物联网学报, 2022, 6(4): 53-64. |
[12] | 亓晋, 王微, 陈孟玺, 许斌, 董振江, 孙雁飞. 工业互联网的概念、体系架构及关键技术[J]. 物联网学报, 2022, 6(2): 38-49. |
[13] | 徐建东, 李睿嵩, 常昊, 杨轶, 张盛, 任天令. 智能隐形眼镜的研究进展与挑战[J]. 物联网学报, 2022, 6(1): 1-12. |
[14] | 栾宁, 熊轲, 张煜, 何睿斯, 屈钢, 艾渤. 6G:典型应用、关键技术与面临挑战[J]. 物联网学报, 2022, 6(1): 29-43. |
[15] | 朱开磊, 孙爱晶. 基于布谷鸟优化K均值的WSN分簇路由算法[J]. 物联网学报, 2022, 6(1): 73-81. |
阅读次数 | ||||||
全文 |
|
|||||
摘要 |
|
|||||
|