通信学报 ›› 2020, Vol. 41 ›› Issue (12): 47-59.doi: 10.11959/j.issn.1000-436X.2020244
李永豪1,2, 胡亮1,2, 张平1,2, 高万夫1,2,3
修回日期:
2020-10-18
出版日期:
2020-12-25
发布日期:
2020-12-01
作者简介:
李永豪(1992- ),男,河南安阳人,吉林大学博士生,主要研究方向为多标签学习、特征选择。基金资助:
Yonghao LI1,2, Liang HU1,2, Ping ZHANG1,2, Wanfu GAO1,2,3
Revised:
2020-10-18
Online:
2020-12-25
Published:
2020-12-01
Supported by:
摘要:
针对基于图的多标签特征选择方法忽略图拉普拉斯矩阵的动态变化,且利用逻辑标签来指导特征选择过程而丢失标签信息等问题,提出了一种基于动态图拉普拉斯矩阵和实值标签的多标签特征选择方法。该方法利用特征矩阵的稳健低维空间构造动态图拉普拉斯矩阵,并利用该稳健低维空间作为实值标签空间,进一步使用流形约束和非负约束将逻辑标签转化为实值标签,以此来解决上述问题。所提方法与3种多标签特征选择方法在9个多标签基准数据集上进行了对比实验,实验结果表明,所提多标签特征选择方法可得到高质量的特征子集,并且能获得很好的分类表现。
中图分类号:
李永豪, 胡亮, 张平, 高万夫. 基于动态图拉普拉斯的多标签特征选择[J]. 通信学报, 2020, 41(12): 47-59.
Yonghao LI, Liang HU, Ping ZHANG, Wanfu GAO. Multi-label feature selection based on dynamic graph Laplacian[J]. Journal on Communications, 2020, 41(12): 47-59.
表1
数据集参数"
数据集 | 样例数/个 | 特征数/个 | 标签数/个 | 训练样例数/个 | 测试样例数/个 | 领域 |
Arts | 5 000 | 462 | 26 | 2 000 | 3 000 | 文本(Web) |
Birds | 645 | 260 | 19 | 322 | 323 | 声音 |
Yeast | 2 417 | 103 | 14 | 1 500 | 917 | 生物 |
Education | 5 000 | 550 | 33 | 2 000 | 3 000 | 文本(Web) |
Enron | 1 702 | 1 001 | 53 | 1 123 | 579 | 文本 |
Social | 5 000 | 1 047 | 39 | 2 000 | 3 000 | 文本(Web) |
Science | 5 000 | 743 | 40 | 2 000 | 3 000 | 文本(Web) |
Entertain | 5 000 | 636 | 27 | 2 000 | 3 000 | 文本(Web) |
Society | 5 000 | 640 | 21 | 2 000 | 3 000 | 文本(Web) |
表2
特征选择方法在SVM分类器上的Micro-F1结果"
数据集 | 所提方法 | MIFS | RALM-FS | SCLS |
Arts | 0.139±0.078 | 0.106±0.046 | 0.102±0.061 | |
Birds | 0.116±0.059 | 0.060±0.040 | 0.096±0.046 | |
Yeast | 0.547±0.035 | 0.532±0.008 | 0.552±0.027 | |
Education | 0.073±0.059 | 0.073±0.059 | 0.193±0.056 | |
Enron | 0.372±0.027 | 0.488±0.031 | 0.389±0.059 | |
Social | 0.276±0.136 | 0.363±0.120 | 0.149±0.112 | |
Science | 0.129±0.057 | 0.037±0.035 | 0.097±0.054 | |
Entertain | 0.228±0.112 | 0.113±0.072 | 0.214±0.100 | |
Society | 0.300±0.042 | 0.216±0.028 | 0.223±0.059 | |
平均值 | 0.242 | 0.221 | 0.224 |
表3
特征选择方法在SVM分类器上的Macro-F1结果"
数据集 | 所提方法 | MIFS | RALM-FS | SCLS |
Arts | 0.055±0.034 | 0.038±0.016 | 0.039±0.026 | |
Birds | 0.075±0.036 | 0.039±0.026 | 0.058±0.024 | |
Yeast | 0.219±0.048 | 0.207±0.014 | 0.229±0.036 | |
Education | 0.019±0.017 | 0.019±0.015 | 0.052±0.014 | |
Enron | 0.074±0.017 | 0.119±0.031 | 0.074±0.027 | |
Social | 0.031±0.016 | 0.036±0.014 | 0.014±0.012 | |
Science | 0.034±0.016 | 0.007±0.007 | 0.036±0.020 | |
Entertain | 0.097±0.047 | 0.044±0.030 | 0.081±0.038 | |
Society | 0.055±0.020 | 0.021±0.003 | 0.032±0.012 | |
平均值 | 0.073 | 0.059 | 0.068 |
表4
征选择方法在3NN分类器上的Micro-F1结果"
数据集 | 所提方法 | MIFS | RALM-FS | SCLS |
Arts | 0.202±0.052 | 0.164±0.024 | 0.182±0.043 | |
Birds | 0.135±0.061 | 0.167±0.061 | 0.144±0.043 | |
Yeast | 0.525±0.053 | 0.518±0.035 | 0.529±0.019 | |
Education | 0.183±0.055 | 0.176±0.037 | 0.260±0.050 | |
Enron | 0.410±0.024 | 0.437±0.026 | 0.365±0.073 | |
Social | 0.338±0.081 | 0.376±0.057 | 0.315±0.054 | |
Science | 0.171±0.037 | 0.115±0.016 | 0.160±0.048 | |
Entertain | 0.276±0.065 | 0.234±0.033 | 0.273±0.065 | |
Society | 0.305±0.043 | 0.245±0.021 | 0.255±0.050 | |
平均值 | 0.286 | 0.268 | 0.279 |
表5
特征选择方法在3NN分类器上的Macro-F1结果"
数据集 | 所提方法 | MIFS | RALM-FS | SCLS |
Arts | 0.095±0.033 | 0.074±0.014 | 0.071±0.025 | |
Birds | 0.085±0.038 | 0.078±0.028 | 0.093±0.036 | |
Yeast | 0.282±0.057 | 0.301±0.026 | 0.300±0.027 | |
Education | 0.043±0.018 | 0.059±0.021 | 0.084±0.023 | |
Enron | 0.087±0.014 | 0.111±0.013 | 0.081±0.026 | |
Social | 0.051±0.017 | 0.049±0.013 | 0.038±0.012 | |
Science | 0.062±0.015 | 0.037±0.010 | 0.057±0.021 | |
Entertain | 0.138±0.042 | 0.112±0.020 | 0.131±0.040 | |
Society | 0.088±0.020 | 0.055±0.011 | 0.053±0.016 | |
平均值 | 0.106 | 0.097 | 0.101 |
[1] | GUI J , SUN Z N , JIS W ,et al. Feature selection based on structured sparsity:a comprehensive study[J]. IEEE Transactions on Neural Networks and Learning Systems, 2016,28(7): 1-18. |
[2] | BOLON C N , SANCHEZ M N , ALONSO B A ,et al. A review of microarray datasets and applied feature selection methods[J]. Information Sciences, 2014,282: 111-135. |
[3] | ZHANG M L , ZHOU Z H . A review on multi-label learning algorithms[J]. IEEE Transactions on Knowledge and data Engineering, 2014,26(8): 1819-1837. |
[4] | TSOUMAKAS G , KATAKIS I , VLAHAVAS I . Mining multi-label data[M]. Berlin:Springer. 2009: 667-685. |
[5] | TSOUMAKAS G , KATAKIS I . Multi-label classification:an overview[J]. International Journal of Data Warehousing and Mining, 2007,3(3): 1-13. |
[6] | KASHEF S , NEZAMABADI-POUR H , NIKPOUR B . Multilabel feature selection:a comprehensive review and guiding experiments[J]. Wiley Interdisciplinary Reviews Data Mining & Knowledge Discovery, 2018,8(2): 12-40. |
[7] | 刘慧婷, 冷新杨, 王利利 ,等. 联合嵌入式多标签分类算法[J]. 自动化学报, 2019,45(10): 1969-1982. |
LIU H T , LENG X Y , WANG L L ,et al. A joint embedded multi-label classification algorithm[J]. Acta Automatica Sinica, 2019,45(10): 1969-1982. | |
[8] | LI J , CHENG K , WANG S ,et al. Feature selection:a data perspective[J]. ACM Computing Surveys, 2018,50(6): 1-45. |
[9] | SAEYS Y , INZA I , LARRA?AGA P . A review of feature selection techniques in bioinformatics[J]. Bioinformatics, 2007,23(19): 2507-2517. |
[10] | 李占山, 刘兆赓 . 基于 XGBoost 的特征选择算法[J]. 通信学报, 2019,40(10): 101-108. |
LI Z S , LIU Z G . Feature selection algorithm based on XGBoost[J]. Journal on Communications, 2019,40(10): 101-108. | |
[11] | ZHANG P , LIU G X , GAO W F . Distinguishing two types of labels for multi-label feature selection[J]. Pattern Recognition, 2019,95(1): 72-82. |
[12] | CAI Z , ZHU W . Multi-label feature selection via feature manifold learning and sparsity regularization[J]. International Journal of Machine Learning & Cybernetics, 2018,9(8): 1321-1334. |
[13] | RODRIGUES D , PEREIRA L A , NAKAMURA R Y ,et al. A wrapper approach for feature selection based on bat algorithm and optimum-path forest[J]. Expert Systems with Applications, 2014,41(5): 2250-2258. |
[14] | 张俐, 王枞 . 基于最大相关最小冗余联合互信息的多标签特征选择算法[J]. 通信学报, 2018,39(5): 115-126. |
ZHANG L , WANG C . Multi-label feature selection algorithm based on joint mutual information of max-relevance and min-redundancy[J]. Journal on Communications, 2018,39(5): 115-126. | |
[15] | ZHANG J , LUO Z , LI C ,et al. Manifold regularized discriminative feature selection for multi-label learning[J]. Pattern Recognition, 2019,95(1): 136-150. |
[16] | HUANG J , NIE F P , HUANG H ,et al. Robust manifold nonnegative matrix factorization[J]. ACM Transactions on Knowledge Discovery from Data, 2014,8(3): 1-21. |
[17] | JIAN L , LI J , SHU K ,et al. Multi-label informed feature selection[C]// Proceedings of the Twenty-Fifth International Joint Conference on Artificial Intelligence. Palo Alto:AAAI Press, 2016: 1627-1633. |
[18] | NIE F P , HUANG H , CAI X ,et al. Efficient and robust feature selection via joint ?2,1-norms minimization[C]// Proceedings of the Advances in neural Information Processing Systems. Massachusetts:MIT Press, 2010: 1813-1821. |
[19] | CHANG X , NIE F P , YANG Y ,et al. A convex formulation for semi-supervised multi-label feature selection[C]// Proceedings of the National Conference on Artificial Intelligence. Palo Alto:AAAI Press, 2014: 1171-1177. |
[20] | BOUTELL M R , LUO J , SHEN X ,et al. Learning multi-label scene classification[J]. Pattern Recognition, 2004,37(9): 1757-1771. |
[21] | FüRNKRANZ J , HüLLERMEIER E , MENCíA E L ,et al. Multilabel classification via calibrated label ranking[J]. Machine Learning, 2008,73(2): 133-153. |
[22] | QI G J , HUA X S , RUI Y ,et al. Correlative multi-label video annotation[C]// Proceedings of the 15th International Conference on Multimedia. New York:ACM Press, 2007: 24-29. |
[23] | HUANG J , LI G , HUANG Q ,et al. Learning label-specific features and class-dependent labels for multi-label classification[J]. IEEE Transactions on Knowledge and Data Engineering, 2016,28(12): 3309-3323. |
[24] | ZHANG Y , YANG Y , LI T ,et al. A multitask multiview clustering algorithm in heterogeneous situations based on LLE and LE[J]. Knowledge-Based Systems, 2019,163(1): 776-786. |
[25] | REN Y , ZHANG G , YU G ,et al. Local and global structure preserving based feature selection[J]. Neurocomputing, 2012,89(1): 147-157. |
[26] | HUANG R , JIANG W , SUN G . Manifold-based constraint Laplacian score for multi-label feature selection[J]. Pattern Recognition Letters, 2018,112(1): 346-352. |
[27] | XU Y , WANG J , AN S ,et al. Semi-supervised multi-label feature selection by preserving feature-label space consistency[C]// Proceedings of the 27th ACM International Conference on Information and Knowledge Management. New York:ACM Press, 2018: 783-792. |
[28] | CHEN G , SONG Y , WANG F ,et al. Semi-supervised multi-label learning by solving a Sylvester equation[C]// Proceedings of the 2008 SIAM International Conference on Data Mining. Saarland:DBLP, 2008: 410-419. |
[29] | SHAO R , XU N , GENG X . Multi-label learning with label enhancement[C]// Proceedings of the 2018 IEEE International Conference on Data Mining. Piscataway:IEEE Press, 2018: 437-446. |
[30] | LEE J , KIM D W . SCLS:multi-label feature selection based on scalable criterion for large label set[J]. Pattern Recognition, 2017,66(1): 342-352. |
[31] | CAI X , NIE F P , HUANG H . Exact top-k feature selection via l2,0-norm constraint[C]// Proceedings of the Twenty-Third International Joint Conference on Artificial Intelligence. Palo Alto:AAAI Press, 2013: 1240-1246. |
[32] | ZHU Y , KWOK J T , ZHOU Z H . Multi-label learning with global and local label correlation[J]. IEEE Transactions on Knowledge and Data Engineering, 2018,30(6): 1081-1094. |
[33] | YAN H , YANG J , YANG J . Robust joint feature weights learning framework[J]. IEEE Transactions on Knowledge and Data Engineering, 2016,28(5): 1327-1339. |
[34] | DEMPSTER A P , LAIRD N M , RUBIN D B . Maximum likelihood from incomplete data via the EM algorithm[J]. Journal of the Royal Statistical Society:Series B (Methodological), 1977,39(1): 1-22. |
[35] | LEE D D , SEUNG H S . Algorithms for non-negative matrix factorization[C]// Proceedings of the Advances in Neural Information Processing Systems. Massachusetts:MIT Press, 2001: 556-562. |
[36] | CAI D , HE X , HAN J ,et al. Graph regularized nonnegative matrix factorization for data representation[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2010,33(8): 1548-1560. |
[37] | TSOUMAKAS G , SPYROMITROS-XIOUFIS E , VILCEK J ,et al. Mulan:a Java library for multi-label learning[J]. Journal of Machine Learning Research, 2011,12(7): 2411-2414. |
[38] | YU K , YU S , TRESP V . Multi-label informed latent semantic indexing[C]// Proceedings of the 28th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval. New York:ACM Press, 2005: 258-265. |
[1] | 王一丰, 郭渊博, 陈庆礼, 方晨, 林韧昊, 周永良, 马佳利. 基于对比增量学习的细粒度恶意流量分类方法[J]. 通信学报, 2023, 44(3): 1-11. |
[2] | 杨洁, 董标, 付雪, 王禹, 桂冠. 基于轻量化分布式学习的自动调制分类方法[J]. 通信学报, 2022, 43(7): 134-142. |
[3] | 王一丰, 郭渊博, 陈庆礼, 方晨, 林韧昊. 基于对比学习的细粒度未知恶意流量分类方法[J]. 通信学报, 2022, 43(10): 12-25. |
[4] | 熊金波, 周永洁, 毕仁万, 万良, 田有亮. 边缘协同的轻量级隐私保护分类框架[J]. 通信学报, 2022, 43(1): 127-137. |
[5] | 王惠琴, 侯文斌, 彭清斌, 曹明华, 黄瑞, 刘玲. 基于K均值聚类的SPPM分步分类检测算法[J]. 通信学报, 2022, 43(1): 161-171. |
[6] | 陈志旺, 张忠新, 宋娟, 雷海鹏, 彭勇. 在线目标分类及自适应模板更新的孪生网络跟踪算法[J]. 通信学报, 2021, 42(8): 151-163. |
[7] | 高红民, 曹雪莹, 陈忠昊, 花再军, 李臣明, 陈月. 基于多尺度近端特征拼接网络的高光谱图像分类方法[J]. 通信学报, 2021, 42(2): 92-102. |
[8] | 刘传宏, 郭彩丽, 杨洋, 冯春燕, 孙启政, 陈九九. 人工智能物联网中面向智能任务的语义通信方法[J]. 通信学报, 2021, 42(11): 97-108. |
[9] | 傅友华, 陈栋. 混合智能反射表面结构辅助的毫米波通信信道估计[J]. 通信学报, 2021, 42(10): 189-196. |
[10] | 胡永进,郭渊博,马骏,张晗,毛秀青. 基于对抗样本的网络欺骗流量生成方法[J]. 通信学报, 2020, 41(9): 59-70. |
[11] | 顾纯祥,吴伟森,石雅男,李光松. 基于自编码器的未知协议分类方法[J]. 通信学报, 2020, 41(6): 88-97. |
[12] | 赵小虎,王刚,宋泊明,于嘉成. 基于压缩感知的设备多源信息传输与分类算法[J]. 通信学报, 2020, 41(2): 13-24. |
[13] | 高红民,曹雪莹,杨耀,花再军,李臣明. 基于CNN的双边融合网络在高光谱图像分类中的应用[J]. 通信学报, 2020, 41(11): 132-140. |
[14] | 赵泽,高源,崔莉. VehLoc:基于低功耗蓝牙多信道RSSI值的车内高精度定位方法[J]. 通信学报, 2018, 39(12): 47-59. |
[15] | 于银辉,周恒,杨莹,潘昊,任嘉鹏. 大规模MIMO系统基于小区分类-交叉熵的导频调度算法[J]. 通信学报, 2018, 39(12): 75-81. |
阅读次数 | ||||||
全文 |
|
|||||
摘要 |
|
|||||
|