基于动态图拉普拉斯的多标签特征选择

doi:10.11959/j.issn.1000-436X.2020244

通信学报 ›› 2020, Vol. 41 ›› Issue (12): 47-59.doi: 10.11959/j.issn.1000-436X.2020244

基于动态图拉普拉斯的多标签特征选择

李永豪¹^,², 胡亮¹^,², 张平¹^,², 高万夫¹^,²^,³

¹ 吉林大学计算机科学与技术学院，吉林长春130012
² 吉林大学符号计算与知识工程教育部重点实验室，吉林长春 130012
³ 吉林大学化学学院，吉林长春130012

修回日期:2020-10-18 出版日期:2020-12-25 发布日期:2020-12-01
作者简介:李永豪（1992- ），男，河南安阳人，吉林大学博士生，主要研究方向为多标签学习、特征选择。
胡亮（1968- ），男，吉林长春人，博士，吉林大学教授、博士生导师，主要研究方向为人工智能和分布式计算。
张平（1991- ），女，河北石家庄人，吉林大学博士生，主要研究方向为多标签学习、特征选择。
高万夫（1990- ），男，吉林辽源人，博士，吉林大学讲师，主要研究方向为机器学习、特征选择、多标签学习。
基金资助:
博士后创新人才支持计划基金资助项目(BX20190137);中国博士后科学基金资助项目(2020M670839);国家重点研发计划基金资助项目(2017YFA0604500);吉林省重点科技研发基金资助项目(20180201103GX)

Multi-label feature selection based on dynamic graph Laplacian

Yonghao LI¹^,², Liang HU¹^,², Ping ZHANG¹^,², Wanfu GAO¹^,²^,³

¹ College of Computer Science and Technology, Jilin University, Changchun 130012, China
² Key Laboratory of Symbolic Computation and Knowledge Engineering, Ministry of Education, Jilin University, Changchun 130012, China
³ College of Chemistry, Jilin University, Changchun 130012, China

Revised:2020-10-18 Online:2020-12-25 Published:2020-12-01
Supported by:
Postdoctoral Innovative Talents Support Program under Grant(BX20190137);China Postdoctoral Science Foundation Founded Project(2020M670839);The National Key Research and Development Program of China(2017YFA0604500);Key Scientific and Technological Research and Development Plan of Jilin Province(20180201103GX)

摘要/Abstract

摘要：

针对基于图的多标签特征选择方法忽略图拉普拉斯矩阵的动态变化，且利用逻辑标签来指导特征选择过程而丢失标签信息等问题，提出了一种基于动态图拉普拉斯矩阵和实值标签的多标签特征选择方法。该方法利用特征矩阵的稳健低维空间构造动态图拉普拉斯矩阵，并利用该稳健低维空间作为实值标签空间，进一步使用流形约束和非负约束将逻辑标签转化为实值标签，以此来解决上述问题。所提方法与3种多标签特征选择方法在9个多标签基准数据集上进行了对比实验，实验结果表明，所提多标签特征选择方法可得到高质量的特征子集，并且能获得很好的分类表现。

关键词: 多标签特征选择, 动态图拉普拉斯矩阵, 实值标签, 分类

Abstract:

In view of the problems that graph-based multi-label feature selection methods ignore the dynamic change of graph Laplacian matrix, as well as such methods employ logical-value labels to guide feature selection process and loses label information, a multi-label feature selection method based on both dynamic graph Laplacian matrix and real-value labels was proposed.The robust low-dimensional space of feature matrix was used to construct a dynamic graph Laplacian matrix, and the robust low-dimensional space was used as the real-value label space.Furthermore, manifold and non-negative constraints were adopted to transform logical labels into real-valued labels to address the issues mentioned above.The proposed method was compared to three multi-label feature selection methods on nine multi-label benchmark data sets in experiments.The experimental results demonstrate that the proposed multi-label feature selection method can obtain the higher quality feature subset and achieve good classification performance.

Key words: multi-label feature selection, dynamic graph Laplacian matrix, real-value label, classification

中图分类号:

TP18

李永豪, 胡亮, 张平, 高万夫. 基于动态图拉普拉斯的多标签特征选择[J]. 通信学报, 2020, 41(12): 47-59.

Yonghao LI, Liang HU, Ping ZHANG, Wanfu GAO. Multi-label feature selection based on dynamic graph Laplacian[J]. Journal on Communications, 2020, 41(12): 47-59.

图/表 11

表1

表2

特征选择方法在SVM分类器上的Micro-F1结果"

数据集	所提方法	MIFS	RALM-FS	SCLS
Arts	$0 . 158 \pm 0 . 076$	0.139±0.078	0.106±0.046	0.102±0.061
Birds	$0 . 123 \pm 0 . 058$	0.116±0.059	0.060±0.040	0.096±0.046
Yeast	$0 . 564 \pm 0 . 036$	0.547±0.035	0.532±0.008	0.552±0.027
Education	$0 . 261 \pm 0 . 061$	0.073±0.059	0.073±0.059	0.193±0.056
Enron	$0 . 524 \pm 0 . 051$	0.372±0.027	0.488±0.031	0.389±0.059
Social	$0 . 468 \pm 0 . 118$	0.276±0.136	0.363±0.120	0.149±0.112
Science	$0 . 141 \pm 0 . 053$	0.129±0.057	0.037±0.035	0.097±0.054
Entertain	$0 . 283 \pm 0 . 101$	0.228±0.112	0.113±0.072	0.214±0.100
Society	$0 . 317 \pm 0 . 042$	0.300±0.042	0.216±0.028	0.223±0.059
平均值	$0 . 315$	0.242	0.221	0.224

表2

表3

特征选择方法在SVM分类器上的Macro-F1结果"

数据集	所提方法	MIFS	RALM-FS	SCLS
Arts	$0 . 067 \pm 0 . 038$	0.055±0.034	0.038±0.016	0.039±0.026
Birds	$0 . 077 \pm 0 . 037$	0.075±0.036	0.039±0.026	0.058±0.024
Yeast	$0 . 243 \pm 0 . 048$	0.219±0.048	0.207±0.014	0.229±0.036
Education	$0 . 068 \pm 0 . 015$	0.019±0.017	0.019±0.015	0.052±0.014
Enron	$0 . 135 \pm 0 . 036$	0.074±0.017	0.119±0.031	0.074±0.027
Social	$0 . 054 \pm 0 . 019$	0.031±0.016	0.036±0.014	0.014±0.012
Science	$0 . 056 \pm 0 . 025$	0.034±0.016	0.007±0.007	0.036±0.020
Entertain	$0 . 119 \pm 0 . 039$	0.097±0.047	0.044±0.030	0.081±0.038
Society	$0 . 060 \pm 0 . 016$	0.055±0.020	0.021±0.003	0.032±0.012
平均值	$0 . 097$	0.073	0.059	0.068

表3

表4

征选择方法在3NN分类器上的Micro-F1结果"

数据集	所提方法	MIFS	RALM-FS	SCLS
Arts	$0.212 \pm 0.052$	0.202±0.052	0.164±0.024	0.182±0.043
Birds	0.135±0.061	0.167±0.061	0.144±0.043	$0 . 171 \pm 0 . 066$
Yeast	$0 . 532 \pm 0 . 06$	0.525±0.053	0.518±0.035	0.529±0.019
Education	$0 . 289 \pm 0 . 043$	0.183±0.055	0.176±0.037	0.260±0.050
Enron	$0 . 488 \pm 0 . 033$	0.410±0.024	0.437±0.026	0.365±0.073
Social	$0 . 456 \pm 0 . 061$	0.338±0.081	0.376±0.057	0.315±0.054
Science	$0 . 188 \pm 0 . 028$	0.171±0.037	0.115±0.016	0.160±0.048
Entertain	$0 . 314 \pm 0 . 062$	0.276±0.065	0.234±0.033	0.273±0.065
Society	$0 . 313 \pm 0 . 046$	0.305±0.043	0.245±0.021	0.255±0.050
平均值	$0 . 325$	0.286	0.268	0.279

表4

表5

特征选择方法在3NN分类器上的Macro-F1结果"

数据集	所提方法	MIFS	RALM-FS	SCLS
Arts	$0 . 096 \pm 0 . 043$	0.095±0.033	0.074±0.014	0.071±0.025
Birds	0.085±0.038	$0 . 106 \pm 0 . 042$	0.078±0.028	0.093±0.036
Yeast	$0 . 322 \pm 0 . 055$	0.282±0.057	0.301±0.026	0.300±0.027
Education	$0 . 094 \pm 0 . 018$	0.043±0.018	0.059±0.021	0.084±0.023
Enron	$0 . 129 \pm 0 . 018$	0.087±0.014	0.111±0.013	0.081±0.026
Social	$0 . 067 \pm 0 . 020$	0.051±0.017	0.049±0.013	0.038±0.012
Science	$0 . 079 \pm 0 . 021$	0.062±0.015	0.037±0.010	0.057±0.021
Entertain	$0 . 155 \pm 0 . 039$	0.138±0.042	0.112±0.020	0.131±0.040
Society	0.088±0.020	$0 . 089 \pm 0 . 021$	0.055±0.011	0.053±0.016
平均值	$0 . 124$	0.106	0.097	0.101

表5

图1

图2

图3

图4

图5

图6

参考文献 38

[1]	GUI J , SUN Z N , JIS W ,et al. Feature selection based on structured sparsity:a comprehensive study[J]. IEEE Transactions on Neural Networks and Learning Systems, 2016,28(7): 1-18.
[2]	BOLON C N , SANCHEZ M N , ALONSO B A ,et al. A review of microarray datasets and applied feature selection methods[J]. Information Sciences, 2014,282: 111-135.
[3]	ZHANG M L , ZHOU Z H . A review on multi-label learning algorithms[J]. IEEE Transactions on Knowledge and data Engineering, 2014,26(8): 1819-1837.
[4]	TSOUMAKAS G , KATAKIS I , VLAHAVAS I . Mining multi-label data[M]. Berlin:Springer. 2009: 667-685.
[5]	TSOUMAKAS G , KATAKIS I . Multi-label classification:an overview[J]. International Journal of Data Warehousing and Mining, 2007,3(3): 1-13.
[6]	KASHEF S , NEZAMABADI-POUR H , NIKPOUR B . Multilabel feature selection:a comprehensive review and guiding experiments[J]. Wiley Interdisciplinary Reviews Data Mining ＆ Knowledge Discovery, 2018,8(2): 12-40.
[7]	刘慧婷, 冷新杨, 王利利 ,等. 联合嵌入式多标签分类算法[J]. 自动化学报, 2019,45(10): 1969-1982.
	LIU H T , LENG X Y , WANG L L ,et al. A joint embedded multi-label classification algorithm[J]. Acta Automatica Sinica, 2019,45(10): 1969-1982.
[8]	LI J , CHENG K , WANG S ,et al. Feature selection:a data perspective[J]. ACM Computing Surveys, 2018,50(6): 1-45.
[9]	SAEYS Y , INZA I , LARRA?AGA P . A review of feature selection techniques in bioinformatics[J]. Bioinformatics, 2007,23(19): 2507-2517.
[10]	李占山, 刘兆赓 . 基于 XGBoost 的特征选择算法[J]. 通信学报, 2019,40(10): 101-108.
	LI Z S , LIU Z G . Feature selection algorithm based on XGBoost[J]. Journal on Communications, 2019,40(10): 101-108.
[11]	ZHANG P , LIU G X , GAO W F . Distinguishing two types of labels for multi-label feature selection[J]. Pattern Recognition, 2019,95(1): 72-82.
[12]	CAI Z , ZHU W . Multi-label feature selection via feature manifold learning and sparsity regularization[J]. International Journal of Machine Learning ＆ Cybernetics, 2018,9(8): 1321-1334.
[13]	RODRIGUES D , PEREIRA L A , NAKAMURA R Y ,et al. A wrapper approach for feature selection based on bat algorithm and optimum-path forest[J]. Expert Systems with Applications, 2014,41(5): 2250-2258.
[14]	张俐, 王枞 . 基于最大相关最小冗余联合互信息的多标签特征选择算法[J]. 通信学报, 2018,39(5): 115-126.
	ZHANG L , WANG C . Multi-label feature selection algorithm based on joint mutual information of max-relevance and min-redundancy[J]. Journal on Communications, 2018,39(5): 115-126.
[15]	ZHANG J , LUO Z , LI C ,et al. Manifold regularized discriminative feature selection for multi-label learning[J]. Pattern Recognition, 2019,95(1): 136-150.
[16]	HUANG J , NIE F P , HUANG H ,et al. Robust manifold nonnegative matrix factorization[J]. ACM Transactions on Knowledge Discovery from Data, 2014,8(3): 1-21.
[17]	JIAN L , LI J , SHU K ,et al. Multi-label informed feature selection[C]// Proceedings of the Twenty-Fifth International Joint Conference on Artificial Intelligence. Palo Alto:AAAI Press, 2016: 1627-1633.
[18]	NIE F P , HUANG H , CAI X ,et al. Efficient and robust feature selection via joint ?2,1-norms minimization[C]// Proceedings of the Advances in neural Information Processing Systems. Massachusetts:MIT Press, 2010: 1813-1821.
[19]	CHANG X , NIE F P , YANG Y ,et al. A convex formulation for semi-supervised multi-label feature selection[C]// Proceedings of the National Conference on Artificial Intelligence. Palo Alto:AAAI Press, 2014: 1171-1177.
[20]	BOUTELL M R , LUO J , SHEN X ,et al. Learning multi-label scene classification[J]. Pattern Recognition, 2004,37(9): 1757-1771.
[21]	FüRNKRANZ J , HüLLERMEIER E , MENCíA E L ,et al. Multilabel classification via calibrated label ranking[J]. Machine Learning, 2008,73(2): 133-153.
[22]	QI G J , HUA X S , RUI Y ,et al. Correlative multi-label video annotation[C]// Proceedings of the 15th International Conference on Multimedia. New York:ACM Press, 2007: 24-29.
[23]	HUANG J , LI G , HUANG Q ,et al. Learning label-specific features and class-dependent labels for multi-label classification[J]. IEEE Transactions on Knowledge and Data Engineering, 2016,28(12): 3309-3323.
[24]	ZHANG Y , YANG Y , LI T ,et al. A multitask multiview clustering algorithm in heterogeneous situations based on LLE and LE[J]. Knowledge-Based Systems, 2019,163(1): 776-786.
[25]	REN Y , ZHANG G , YU G ,et al. Local and global structure preserving based feature selection[J]. Neurocomputing, 2012,89(1): 147-157.
[26]	HUANG R , JIANG W , SUN G . Manifold-based constraint Laplacian score for multi-label feature selection[J]. Pattern Recognition Letters, 2018,112(1): 346-352.
[27]	XU Y , WANG J , AN S ,et al. Semi-supervised multi-label feature selection by preserving feature-label space consistency[C]// Proceedings of the 27th ACM International Conference on Information and Knowledge Management. New York:ACM Press, 2018: 783-792.
[28]	CHEN G , SONG Y , WANG F ,et al. Semi-supervised multi-label learning by solving a Sylvester equation[C]// Proceedings of the 2008 SIAM International Conference on Data Mining. Saarland:DBLP, 2008: 410-419.
[29]	SHAO R , XU N , GENG X . Multi-label learning with label enhancement[C]// Proceedings of the 2018 IEEE International Conference on Data Mining. Piscataway:IEEE Press, 2018: 437-446.
[30]	LEE J , KIM D W . SCLS:multi-label feature selection based on scalable criterion for large label set[J]. Pattern Recognition, 2017,66(1): 342-352.
[31]	CAI X , NIE F P , HUANG H . Exact top-k feature selection via l2,0-norm constraint[C]// Proceedings of the Twenty-Third International Joint Conference on Artificial Intelligence. Palo Alto:AAAI Press, 2013: 1240-1246.
[32]	ZHU Y , KWOK J T , ZHOU Z H . Multi-label learning with global and local label correlation[J]. IEEE Transactions on Knowledge and Data Engineering, 2018,30(6): 1081-1094.
[33]	YAN H , YANG J , YANG J . Robust joint feature weights learning framework[J]. IEEE Transactions on Knowledge and Data Engineering, 2016,28(5): 1327-1339.
[34]	DEMPSTER A P , LAIRD N M , RUBIN D B . Maximum likelihood from incomplete data via the EM algorithm[J]. Journal of the Royal Statistical Society:Series B (Methodological), 1977,39(1): 1-22.
[35]	LEE D D , SEUNG H S . Algorithms for non-negative matrix factorization[C]// Proceedings of the Advances in Neural Information Processing Systems. Massachusetts:MIT Press, 2001: 556-562.
[36]	CAI D , HE X , HAN J ,et al. Graph regularized nonnegative matrix factorization for data representation[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2010,33(8): 1548-1560.
[37]	TSOUMAKAS G , SPYROMITROS-XIOUFIS E , VILCEK J ,et al. Mulan:a Java library for multi-label learning[J]. Journal of Machine Learning Research, 2011,12(7): 2411-2414.
[38]	YU K , YU S , TRESP V . Multi-label informed latent semantic indexing[C]// Proceedings of the 28th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval. New York:ACM Press, 2005: 258-265.

数据集	样例数/个	特征数/个	标签数/个	训练样例数/个	测试样例数/个	领域
Arts	5 000	462	26	2 000	3 000	文本(Web)
Birds	645	260	19	322	323	声音
Yeast	2 417	103	14	1 500	917	生物
Education	5 000	550	33	2 000	3 000	文本(Web)
Enron	1 702	1 001	53	1 123	579	文本
Social	5 000	1 047	39	2 000	3 000	文本(Web)
Science	5 000	743	40	2 000	3 000	文本(Web)
Entertain	5 000	636	27	2 000	3 000	文本(Web)
Society	5 000	640	21	2 000	3 000	文本(Web)

基于动态图拉普拉斯的多标签特征选择

Multi-label feature selection based on dynamic graph Laplacian

在线阅读

PDF下载

可视化

摘要/Abstract

引用本文

使用本文

图/表 11

参考文献 38

相关文章 15

Metrics

推荐阅读 0

[1]	王一丰, 郭渊博, 陈庆礼, 方晨, 林韧昊, 周永良, 马佳利. 基于对比增量学习的细粒度恶意流量分类方法[J]. 通信学报, 2023, 44(3): 1-11.
[2]	杨洁, 董标, 付雪, 王禹, 桂冠. 基于轻量化分布式学习的自动调制分类方法[J]. 通信学报, 2022, 43(7): 134-142.
[3]	王一丰, 郭渊博, 陈庆礼, 方晨, 林韧昊. 基于对比学习的细粒度未知恶意流量分类方法[J]. 通信学报, 2022, 43(10): 12-25.
[4]	熊金波, 周永洁, 毕仁万, 万良, 田有亮. 边缘协同的轻量级隐私保护分类框架[J]. 通信学报, 2022, 43(1): 127-137.
[5]	王惠琴, 侯文斌, 彭清斌, 曹明华, 黄瑞, 刘玲. 基于K均值聚类的SPPM分步分类检测算法[J]. 通信学报, 2022, 43(1): 161-171.
[6]	陈志旺, 张忠新, 宋娟, 雷海鹏, 彭勇. 在线目标分类及自适应模板更新的孪生网络跟踪算法[J]. 通信学报, 2021, 42(8): 151-163.
[7]	高红民, 曹雪莹, 陈忠昊, 花再军, 李臣明, 陈月. 基于多尺度近端特征拼接网络的高光谱图像分类方法[J]. 通信学报, 2021, 42(2): 92-102.
[8]	刘传宏, 郭彩丽, 杨洋, 冯春燕, 孙启政, 陈九九. 人工智能物联网中面向智能任务的语义通信方法[J]. 通信学报, 2021, 42(11): 97-108.
[9]	傅友华, 陈栋. 混合智能反射表面结构辅助的毫米波通信信道估计[J]. 通信学报, 2021, 42(10): 189-196.
[10]	胡永进,郭渊博,马骏,张晗,毛秀青. 基于对抗样本的网络欺骗流量生成方法[J]. 通信学报, 2020, 41(9): 59-70.
[11]	顾纯祥,吴伟森,石雅男,李光松. 基于自编码器的未知协议分类方法[J]. 通信学报, 2020, 41(6): 88-97.
[12]	赵小虎,王刚,宋泊明,于嘉成. 基于压缩感知的设备多源信息传输与分类算法[J]. 通信学报, 2020, 41(2): 13-24.
[13]	高红民,曹雪莹,杨耀,花再军,李臣明. 基于CNN的双边融合网络在高光谱图像分类中的应用[J]. 通信学报, 2020, 41(11): 132-140.
[14]	赵泽,高源,崔莉. VehLoc：基于低功耗蓝牙多信道RSSI值的车内高精度定位方法[J]. 通信学报, 2018, 39(12): 47-59.
[15]	于银辉,周恒,杨莹,潘昊,任嘉鹏. 大规模MIMO系统基于小区分类-交叉熵的导频调度算法[J]. 通信学报, 2018, 39(12): 75-81.