一种适应于非完备标签数据和标签关联性的多标签分类方法

doi:10.11959/j.issn.1000-0801.2016197

摘要/Abstract

摘要：

多标签分类已在很多领域得到了实际应用，所用标签大多具有很强的关联性，甚至存在非完备标签或部分标签遗失。然而，现有的多标签分类算法难以同时处理这两种情况。基于此，提出一种新的概率模型处理方法，实现同时对具有标签关联性和遗失标签情况进行多标签分类。该方法可以自动获知和掌握多标签的关联性。此外，通过整合遗失的标签信息，该方法能够提供一个自适应策略来处理遗失的标签。在完备标签和非完备标签的数据上进行实验，结果表明，与现有的多标签分类算法相比，提出的方法得到了较好的分类预测评价值。

关键词: 非完备标签, 标签关联性, 多标签分类, 概率模型

Abstract:

Multi-label classification methods have been applied in many real-world fields，in which the labels may have strong relevance and some of them even are incomplete or missing.However，existing multi-label classification algorithms are unable to handle both issues simultaneously.A new probabilistic model that can automatically learn and exploit multi-label relevance was proposed on label relevance and missing label classification simultaneously.By integrating out the missing information，it also provides a disciplined approach to handle missing labels.Experiments on a number of real world data sets with both complete and incomplete labels demonstrated that the proposed method can achieve higher classification and prediction evaluation scores than the existing multi-label classification algorithms.

Key words: incomplete label, label relevance, multi-label classification, probabilistic model

张丽娜,戴灵鹏,匡泰. 一种适应于非完备标签数据和标签关联性的多标签分类方法[J]. 电信科学, 2016, 32(8): 82-89.

Lina ZHANG,Lingpeng DAI,Tai KUANG. A multi-label classification method for disposing incomplete labeled data and label relevance[J]. Telecommunications Science, 2016, 32(8): 82-89.

图/表 4

图1

表1

表2

表3

10折交叉验证结果"

参数	数据集	ILDLR	MLRGL	FastTag	BR
macro_F1（k=1）	MIRFLICKR	0.467 5±0.001 4	0.458 1±0.008 9	0.461 1±0.010 2	0.455 6±0.002 1
	COREL5K	0.240 7±0.054 1	0.110 8±0.015 1	0.199 7±0.010 1	0.142 2±0.027 9
	ESPGAME	0.233 1±0.019 1	0.147 1±0.011 1	0.179 1±0.007 8	0.193 9±0.014 1
	IAPRTCL12	0.237 1±0.045 3	0.058 9±0.038 9	0.192 4±0.004 9	0.209 6±0.033 4
micro_F1（k=1）	MIRFLICKR	0.467 8±0.002 1	0.457 8±0.006 7	0.462 3±0.010 3	0.453 4±0.001 1
	COREL5K	0.240 0±0.053 9	0.111 1±0.013 4	0.198 7±0.011 1	0.141 9±0.027 8
	ESPGAME	0.233 1±0.019 2	0.144 8±0.010 1	0.187 9±0.007 8	0.193 2±0.014 3
	IAPRTCL12	0.236 1±0.045 3	0.056 7±0.040 1	0.193 1±0.004 2	0.209 6±0.034 4
macro_F1（k=2）	MIRFLICKR	0.453 2±0.006 5	0.456 7±0.007 8	0.434 4±0.051 0	0.443 4±0.002 6
	COREL5K	0.240 5±0.050 9	0.110 5±0.015 1	0.181 2±0.011 9	0.016 7±0.011 1
	ESPGAME	0.231 1±0.020 2	0.145 5±0.011 4	0.221 6±0.004 9	0.183 4±0.012 6
	IAPRTCL12	0.234 4±0.043 7	0.058 8±0.038 7	0.181 1±0.006 5	0.197 1±0.033 1
micro_F1（k=2）	MIRFLICKR	0.435 4±0.005 6	0.439 8±0.006 5	0.415 4±0.044 5	0.422 1±0.004 6
	COREL5K	0.239 7±0.053 5	0.111 1±0.014 3	0.181 1±0.005 7	0.015 6±0.009 1
	ESPGAME	0.223 1±0.024 5	0.144 5±0.012 1	0.216 7±0.008 9	0.174 4±0.013 9
	IAPRTCL12	0.231 1±0.035 4	0.058 7±0.037 7	0.185 7±0.008 4	0.193 1±0.032 1
macro_F1（k=3）	MIRFLICKR	0.460 1±0.002 3	0.456 7±0.008 7	0.454 3±0.017 8	0.412 1±0.001 1
	COREL5K	0.235 6±0.053 1	0.112 1±0.014 3	0.180 1±0.004 5	0.025 6±0.015 4
	ESPGAME	0.223 4±0.023 2	0.145 7±0.011 1	0.214 5±0.006 7	0.160 1±0.010 1
	IAPRTCL12	0.231 1±0.033 4	0.058 8±0.037 8	0.183 2±0.004 3	0.164 5±0.023 4
micro_F1（k=3）	MIRFLICKR	0.444 4±0.003 1	0.435 6±0.004 3	0.415 1±0.016 1	0.386 7±0.002 4
	COREL5K	0.236 7±0.052 1	0.107 6±0.015 4	0.173 2±0.005 6	0.024 5±0.016 1
	ESPGAME	0.212 3±0.025 5	0.142 1±0.011 2	0.203 4±0.005 6	0.156 7±0.012 3
	IAPRTCL12	0.224 5±0.037 8	0.056 7±0.038 1	0.174 5±0.004 8	0.164 3±0.022 1

表3

参考文献 26

[1]	ZHANG M L , ZHOU Z H . A review on multi-label learning algorithms[J]. IEEE Transactions on Knowledge and Data Engineering, 2014,26(8): 1819-1837.
[2]	MADJAROV G , KOCEV D , GJORGJEVIKJ D ,et al. An extensive experimental comparison of methods for multi-label learning[J]. Pattern Recognition, 2012,45(9): 3084-3104.
[3]	王霄, 周李威, 陈耿 ,等. 一种基于标签相关性的多标签分类算法[J]. 计算机应用研究, 2014,31(9): 2609-2614. WANG X , ZHOU L W , CHEN G ,et al. Correlation label-based multi-label classification algorithm.[J]. Application Research of Computers, 2014,31(9): 2609-2614.
[4]	TSOUMAKAS G , KATAKIS I , VLAHAVAS I . Mining multi-label data[M]. New York: SpringerPress, 2010: 667-685.
[5]	READ J , PFAHRINGER B , HOLMES G ,et al. Classifier chains for multi-label classification [C]// European Conference on Machine Learning,June 14-18,2009, Montreal,Canada. New Jersey： IEEE Press, 2009: 254-269.
[6]	孙霞, 张敏超, 冯筠 ,等. Hadoop框架下的多标签传播算法[J]. 西安交通大学学报, 2015,49(5): 134-139. SUN X , ZHANG M C , FENG J ,et al. A label propagation algorithm for multi-label classification[J]. Journal of Xi’an Jiaotong University, 2015,49(5): 134-139.
[7]	HARIHARAN B , ZELNIK M L , VISHWANATHAN S ,et al. Large scale max-margin multi-label classification with priors[C]// 27th International Conference on Machine Learning,June 21-24,2010, Haifa,Isreal. New Jersey： IEEE Press, 2010: 423-430.
[8]	HSU D , KAKADE S , LANGFORD J ,et al. Multi-label prediction via compressed sensing[J]. Computer Science, 2009: 772-780.
[9]	TAI F , LIN H . Multi-label classification with principal label space transformation[J]. Neural Computation, 2012,24(9): 2508-2542.
[10]	CHEN Y N , LIN H T . Feature-aware label space dimension reduction for multi-label classification[J]. Advances in Neural Information Processing Systems, 2012(2): 1538-1546.
[11]	马宗杰, 刘华文 . 基于奇异值分解-偏最小二乘回归的多标签分类算法[J]. 计算机应用, 2014,34(7): 2058-2061. MA Z J , LIU H W . Multi-label classification based on singular value decomposition-partial least squares regression[J]. Journal of Computer Applications, 2014,34(7): 2058-2061.
[12]	李远航, 刘波, 唐侨 . 面对多标签图数据的主动学习[J]. 计算机科学, 2014,41(11): 260-264. LI Y H , LIU B , TANG Q . Active learning for multi-label classification on graphs[J]. Computer Science, 2014,41(11): 260-264.
[13]	许美香, 孙福明, 李豪杰 . 主动学习的多标签图像分类在线分类[J]. 中国图像图形学报, 2015,20(2): 237-244. XU M X , SUN F M , LI H J . Online multi-label image classification with active learning[J]. Journal of Image and Graphics, 2015,20(2): 237-244.
[14]	徐晓丹, 姚明海, 刘华文 ,等. 基于kNN的多标签分类预处理方法[J]. 计算机科学, 2015,42(5): 106-108. XU X D , YAO M H , LIU H W ,et al. Pre-processing method of multi-label classification based on kNN[J]. Computer Science, 2015,42(5): 106-108.
[15]	BUCK S , JIN R , JAIN A . Multi-label learning with incomplete class assignments[C]// IEEE Conference on Computer Vision and Pattern Recognition,June 20-25,2011, Providence,RI,USA. New Jersey： IEEE Press, 2012: 2801-2808.
[16]	CHEN M , ZHENG A , WEINBERGER K Q . Fast image tagging[C]// 30th International Conference on Machine Learning,June 16-21,2013, Atlanta,GA,USA. New Jersey： IEEE Press, 2013: 1274-1282.
[17]	YU H F , JAIN P , DHILLON I S . Large-scale multi-label learning with missing labels[C]// 31st International Conference on Machine Learning,June 21-26,2014, Beijing,China. New Jersey： IEEE Press, 2014: 593-601.
[18]	PETTERSON J , CAETANO T . Submodular multi-label learning[J]. Advances in Neural Information Processing Systems, 2011: 1512-1520.
[19]	GUPTA A , NAGAR D . Matrix variate distributions[M]. Boca Raton: Chapman＆Hall/CRC PressPress, 2000.
[20]	RAI P , KUMAR A , III H D . Simultaneously leveraging output and task structures for multiple-output regression[C]// Advances in Neural Information Processing Systems,December 3-8,2012, South Lake Tahoe,USA. New Jersey： IEEE Press, 2012: 3194-3202.
[21]	ROTHMAN A J , LEVINA E , ZHU J . Sparse multivariate regression with covariance estimation[J]. Journal of Computational and Graphical Statistics, 2010,19(4): 947-962.
[22]	ZHANG Y , YEUNG D Y . A convex formulation for learning task relationships in multi-task learning[C]// 26th Conference on Uncertainty in Artificial Intelligence,July 8-11,2010, Los Angeles,USA. New Jersey： IEEE Press, 2010: 733-742.
[23]	BERTSEKAS D P . Nonlinear programming[M]. Nashua: Athena ScientificPress, 1999.
[24]	BISHOP C M . Pattern recognition and machine learning[M]. New York: Springer-VerlagPress, 2006: 125-153.
[25]	BECK A , TEBOULLE M . A fast iterative shrinkage-thresholding algorithm for linear inverse problem[J]. SIAM Journal on Imaging Sciences, 2009,2(1): 183-202.
[26]	GUILLAUMIN M , MENSINK T , VERBEEK J ,et al. Tagprop:discriminative metric learning in nearest neighbor models for image auto-annotation[C]// International Conference on Computer Vision,September 29-October 2,2009, Kyoto,Japan. New Jersey： IEEE Press, 2009: 309-316.

数据集	标签数/个	样点数/个	每个样点的平均正标签数/个	每个样点的最大负标签数/个
MIRFLICKR	38	25 000	4.7	17
COREL5K	260	4 999	3.4	5
ESPGAME	268	23 641	4.7	15
IAPRTCL12	291	19 627	5.7	23

参数	数据集	ILDLR	BR	CPLST	MLRGL	FastTag
macro_F1	MIRFLICKR	0.499 6±0.005 1	0.494 1±0.004 2	0.493 2±0.004 2	0.495 6±0.007 3	0.467 9±0.005 3
	COREL5K	0.208 1±0.056 3	0.201 0±0.046 6	0.200 4±0.049 3	0.131 5±0.044 4	0.203 6±0.038 0
	ESPGAME	0.237 8±0.013 1	0.231 4±0.016 2	0.232 7±0.016 7	0.125 1±0.007 2	0.228 4±0.013
	IAPRTCL12	0.246 7±0.040 5	0.235 9±0.042 2	0.237 5±0.042 4	0.126 7±0.041 1	0.228 5±0.041 1
micro_F1	MIRFLICKR	0.467 1±0.003 4	0.461 1±0.005 5	0.461 1±0.004 7	0.466 1±0.009 2	0.436 2±0.006 1
	COREL5K	0.208 3±0.055 5	0.196 7±0.048 1	0.198 2±0.048 1	0.130 5±0.043 6	0.202 2±0.036 2
	ESPGAME	0.227 6±0.014 0	0.221±0.017 3	0.221 7±0.018 0	0.118 9±0.001 0	0.218 9±0.017 4
	IAPRTCL12	0.240 8±0.045 1	0.231 5±0.043 1	0.231 0±0.042 7	0.123 9±0.041 2	0.221 1±0.043 6