类不均衡的半监督高斯过程分类算法

doi:10.3969/j.issn.1000-436x.2013.05.005

摘要/Abstract

摘要：

摘要：针对传统的监督学习方法难以解决真实数据集标记信息少、训练样本集中存在类不均衡的问题，提出了类不均衡的半监督高斯过程分类算法。算法引入自训练的半监督学习思想，结合高斯过程分类算法计算后验概率，向未标记数据中注入类标记以获得更多准确可信的标记数据，使得训练样本的类分布相对平衡，分类器自适应优化以获得较好的分类效果。实验结果表明，在类不均衡的训练样本及标记信息过少的情况下，该算法通过自训练分类器获得了有效标记，使分类精度得到了有效提高，为解决类不均衡数据分类提供了一个新的思路。

关键词: 类不均衡, 半监督, 高斯过程分类, 自训练

Abstract:

The traditional supervised learning is difficult to deal with real-world datasets with less labeled information when the training sets class is imbalanced.Therefore,a new semi-supervised Gaussian process classification of address-ing was proposed.The semi-supervised Gaussian process was realized by calculating the posterior probability to obtain more accurate and credible labeled data,and embarking from self-training semi-supervised methods to add class label into the unlabeled data.The algorithm makes the distribution of training samples relatively balance,so the classifier can adaptively optimized to obtain better effect of classification.According to the experimental results,when the circum-stances of training set are class imbalance and much lack of label information,The algorithm improves the accuracy by obtaining effective labeled in comparison with other related works and provides a new idea for addressing the class im-balance is demonstrated.

Key words: class imbalance, semi-supervised, Gaussian process classification, self-training

夏战国,夏士雄,蔡世玉,万玲. 类不均衡的半监督高斯过程分类算法[J]. 通信学报, 2013, 34(5): 42-51.

Zhan-guo XIA,Shi-xiong XIA,Shi-yu CAI,Ling WAN. Semi-supervised Gaussian process classification algorithm addressing the class imbalance[J]. Journal on Communications, 2013, 34(5): 42-51.

图/表 9

图1

图2

图3

图4

图5

图6

表1

图7

表2

参考文献 23

[1]	KITAYAMA S , YAMAZAKI K . Simple estimate of the width in Gaussian kernel with adaptive scaling technique[J]. Applied Soft Computering, 2011,11(8):4726-4737.
[2]	RODNER E , WACKER E S , KEMMLER M , et al . One-class classi-fication for anomaly detection in wire ropes with Gaussian processes in a few lines of code[A].Proceedings of the 12th IAPR Conference on Machine Vision Applications (MVA)[C]. Japan, 2010. 296-308.
[3]	姚伏天 . 基于高斯过程的高光谱图像分类研究[D]. 杭州：浙江大学， 2011.
[4]	KAPOOR A , GRAUMAN K , URTASUN R , et al . Gaussian processes for object categorization[J].International Journal of Computer Vision, 2010,88(2):169-188.
[5]	孙欣尧，王雪，王晟 . 无线传感网络协同概率多模识别方法[J]. 通信学报， 2011,32(6):141-147.
[6]	熊志化 . 高斯过程模型及其在工业过程软测量中的应用研究[D]. 上海：上海交通大学， 2006.
[7]	VAN GOOL E , WINN W , ZISSERMAN A . The PASCAL visual object classes (VOC) challenge[J]. International Journal of Computer Vision, 2010,88(2):303-338.
[8]	陈凤 . 基于HRRP和JEM信号的雷达目标识别技术研究[D]. 西安：西安电子科技大学， 2009.
[9]	王磊，邹北骥，彭小宁等 . 基于高斯过程的表情动作单元跟踪技术[J]. 电子学报， 2007,35(11):2087-2091.
[10]	DEISENROTH M P , TURNER R D , HUBER M F , et al . Robust filtering and smoothing with Gaussian processes[J]. IEEE Transac-tions on Automatic Control, 2012,57(7):1865-1871.
[11]	GASBARRA D , SOTTINEN T , ZANTEN H V . Conditional full support of Gaussian processes with stationary increments[J]. Journal of Applied Probability, 2011,48(2):561-568.
[12]	RODNER E , DENZLER J . One-shot learning of object categories using dependent Gaussian processes[A]. Proceedings of the DAGM Conference on Pattern Recognition[C]. Springer,Heidelberg, 2010. 232-241.
[13]	BOSCH A , ZISSERMAN A , MUNOZ X . Representing shape with a spatial pyramid kernel[A].ACM International Conference on Image and Video Retrieval (CIVR)[C]. Amsterdam,Netherlands, 2007. 401-408.
[14]	CHUM O , ZISSERMAN A . An exemplar model for learning object classes[A]. ACM International Conference on Image and Video Re-trieval (CIVR)[C]. Amsterdam,Netherlands, 2007. 19-21.
[15]	HAGERW W . Updating the inverse of a matrix[J]. Society for Indus-trial and Applied Mathematics (SIAM) Review, 1989,31(2):221-239.
[16]	ADANKON M M , CHERIET M . Model selection for the LS-SVM application to handwriting recognition[J]. Pattern Recognition, 2009,42(12):3264-3270.
[17]	CATANZARO B , SUNDARAM N , KEUTZER K . Fast support vector machine training and classification on graphics processors[A]. Pro-ceedings of the 25th International Conference on Machine Learn-ing(ICML)[C]. New York,NY,USA, 2008. 104-111.
[18]	TOHME M , LENGELLE R . Maximum margin one class support vector machines for multiclass problems[J]. Pattern Recognition Let-ters, 2011,32(13):1652-1658.
[19]	FENG W , XIE L , ZENG J , et al . Audio-visual human recognition using semi-supervised spectral learning and hidden Markov models[J]. Journal of Visual Languages ＆ Computing, 2009,20(3):188-195.
[20]	RUIZ C , SPILIOPOULOU M , MENASALVAS E . Density-based semi-supervised clustering[J]. Data Mining and Knowledge Discovery, 2010,21(3):345-370.
[21]	RASMUSSEN C E , WILLIAMS C K I . Gaussian Processes for Ma-chine Learning[M]. Cambridge: MIT Press, 2006.
[22]	陈晓峰，王士同，曹苏群 . 监督多标记学习的基因功能分析[J]. 智能系统学报， 2008,3(1):83-90.
[23]	KLAUS B , JOHANNS F , EYKE H . A unified model for multilabel classification and ranking[A].Proceedings of the 2006 Conference on ECAI 2006: 17th European Conference on Artificial Intelligence[C]. Riva del Garda,Italy, 2006. 489-493.

标记比例	算法的全局准确度/%			算法的正类准确度/%
标记比例	GP	TSVM	SSGP	GP	TSVM	SSGP
1:1	93.79	96.21	95.08	92.11	96.06	94.74
1:2	93.53	94.35	94.75	90.43	92.71	93.54
1:4	91.37	93.98	94.69	86.12	89.56	91.63
1:8	90.03	93.57	94.82	82.30	88.74	89.95
1:16	86.55	93.24	93.66	75.12	87.32	88.52

标记数据率	SSGP错分率/%			GP错分率/%			SVM错分率/%
标记数据率	平均	最佳	最坏	平均	最佳	最坏	平均	最佳	最坏
1/10	7.48	6.33	8.72	4.51	13.74	5.66	15.18	13.96	16.26
1/20	10.8	7.86	13.64	19.30	16.48	22.46	20.33	16.79	24.36
1/30	12.57	11.63	15.74	29.40	27.32	31.29	29.57	27.57	31.63