基于最大相关最小冗余联合互信息的多标签特征选择算法

doi:10.11959/j.issn.1000-436x.2018082

摘要/Abstract

摘要：

在过去的几十年中，特征选择已经在机器学习和人工智能领域发挥着重要作用。许多特征选择算法都存在着选择一些冗余和不相关特征的现象，这是因为它们过分夸大某些特征重要性。同时，过多的特征会减慢机器学习的速度，并导致分类过渡拟合。因此，提出新的基于前向搜索的非线性特征选择算法，该算法使用互信息和交互信息的理论，寻找与多分类标签相关的最优子集，并降低计算复杂度。在UCI中9个数据集和4个不同的分类器对比实验中表明，该算法均优于原始特征集和其他特征选择算法选择出的特征集。

关键词: 特征选择, 条件互信息, 特征交互, 特征相关, 特征冗余

Abstract:

Feature selection has played an important role in machine learning and artificial intelligence in the past decades.Many existing feature selection algorithm have chosen some redundant and irrelevant features,which is leading to overestimation of some features.Moreover,more features will significantly slow down the speed of machine learning and lead to classification over-fitting.Therefore,a new nonlinear feature selection algorithm based on forward search was proposed.The algorithm used the theory of mutual information and mutual information to find the optimal subset associated with multi-task labels and reduced the computational complexity.Compared with the experimental results of nine datasets and four different classifiers in UCI,the proposed algorithm is superior to the feature set selected by the original feature set and other feature selection algorithms.

Key words: feature selection, conditional mutual information, feature interaction, feature relevance, feature redundancy

中图分类号:

TP181

张俐,王枞. 基于最大相关最小冗余联合互信息的多标签特征选择算法[J]. 通信学报, 2018, 39(5): 111-122.

Li ZHANG,Cong WANG. Multi-label feature selection algorithm based on joint mutual information of max-relevance and min-redundancy[J]. Journal on Communications, 2018, 39(5): 111-122.

图/表 21

图1

表1

表2

表3

表4

表5

图2

图3

图4

图5

表6

表7

表8

表9

表10

表11

图6

图7

图8

图9

图10

参考文献 37

[1]	GEORGE G , HAAS M R , PENTLAND A . Big data and management[J]. Academy of Management Journal, 2014,57(2): 321-326.
[2]	XIE J Y , XIE W X . Several selection algorithms based on the discernibility of a feature subset and support vector machines[J]. Chinese Journal of Computers, 2014,37(8): 1704-1718.
[3]	BROWN G , POCOCK A , ZHAO M J ,et al. Conditional likelihood maximisation-a unifying framework for information theoretic feature selection[J]. Journal of Machine Learning Research, 2012,13: 27-66.
[4]	CHENG H G , QIN Z , FENG C ,et al. Conditional mutual information based feature selection analysing for synergy and redundancy[J]. Electronics and Telecommunications Research Institute, 2011(33): 210-218.
[5]	CHANDRASHEKAR G , SAHIN F . A survey on feature selection methods[J]. Computers and Electrical Engineering, 2014(40): 16-28.
[6]	ZHANG Z H , LI S N , LI Z G ,et al. Multi-label feature selection algorithm based on information entropy[J]. Journal of Computer Research and Development, 2013,50(6): 1177-1184.
[7]	YU H , YANG J . A direct LDA algorithm for high-dimensional data with application to face recognition[J]. Pattern Recognition, 2001(34): 2067-2070.
[8]	BAJWA I S , NAWEED M S , ASIF M N ,et al. Feature based image classification by using principal component analysis[J]. ICGST International Journal on Graphics Vision and Image Processing, 2009(9): 11-17.
[9]	MALDONADO S , WEBER R . A wrapper method for feature selection using support vector machine[J]. Information Science, 2009,179(13): 2208-2217.
[10]	PENG C . Distributed K-Means clustering algorithm based on Fisher discriminant ratio[J]. Journal of Jiangsu University, 2014,35(4): 422-427.
[11]	ZHANG Y S , YANG A , XIONG C ,et al. Feature selection using data envelopment analysis[J]. Knowledge-Based Systems, 2014(64): 70-80.
[12]	YU L , LIU H . Feature selection for high-dimensional data:a fast correlation-based filter solution[C]// The 20th International Conferences on machine learning. 2003: 856-863.
[13]	HUANG D , CHOW T W S . Effective feature selection scheme using mutual information[J]. Neurocomputing, 2005(63): 325-343.
[14]	LIU H W , SUN J G , LIU L ,et al. Feature selection with dynamic mutual information[J]. IEEE Transactions on Neural Networks, 2009,20(2): 189-201.
[15]	DUAN H X , ZHANG Q Y , ZHANG M . FCBF algorithm based on normalized mutual information for feature selection[J]. Journal Huazhong University of Science ＆ Technology(Natural Science Edition), 2017,45(1): 52-56.
[16]	SUN G L , SONG Z C , LIU J L ,et al. Feature selection method based on maximum information coefficient and approximate markov blanket[J]. Acta Automatica Sinica, 2017,43(5): 795-805.
[17]	VERGARA J R , ESTEVEZ P . A review of feature selection methods based on mutual information[J]. Neural Computing and Applications, 2014,24(1): 175-186.
[18]	KWAK N , CHOI C H . Input feature selection for classification problems[J]. IEEE Transactions on Neural Networks, 2002(13): 143-159.
[19]	ESTéVEZ P A , TESMER M , PEREZ C A ,et al. Normalized mutual information feature selection[J]. IEEE Transaction on Neural Networks, 2009(20): 189-201.
[20]	HOQUE N , BHATTACHARYYA D K , KALITA J K . MIFS-ND:a mutual information-based feature selection method[J]. Expert Systems with Applications, 2014,41(14): 6371-6385.
[21]	HOWARD H Y , JOHN M . Feature selection based on joint mutual information[C]// Advances in Intelligent Data Analysis (AIDA),Computational Intelligence Methods and Applications (CIMA),International Computer Science Conventions Rochester New York. 1999: 1-8.
[22]	PENG H , LONG F , DING C . Feature selection based on mutual information:criteria of max-dependency,max-relevance,and min- redundancy[C]// IEEE Transaction on Pattern Analysis ＆ Machine Intelligence. 2005,27(8): 1226-1238.
[23]	VINH L T , THANG N D , LEE Y K . An improved maximum relevance and minimum redundancy feature selection algorithm based on normalized mutual information[C]// Tenth International Symposium on Applications and the Internet. 2010: 395-398.
[24]	LEE J , KIM D W . Mutual information-based multi-label feature selection using interaction information[J]. Expert Systems with Applications, 2015(42): 2013-2025.
[25]	COVER T , THOMAS J . Elements of theory[M]. New York: John Wiley ＆ Sons, 2002.
[26]	JAKULIN A . Attribute interactions in machine learning (Master thesis)[M]// Lecture Notes in Computer Science. 2003.
[27]	JOHN G H , KOHAVI R , PFLEGER K . Irrelevant features and the subset selection problem[C]// The Eleventh International Conference on Machine Learning, 1994: 121-129.
[28]	BENNASAR M , HICKS Y , SETCHI R . Feature selection using Joint mutual information maximisation[J]. Expert System Application, 2015(42): 8520-8532.
[29]	ZHANG Y S , ZHANG Z G . Feature subset selection with cumulate conditional mutual information minimization[J]. Expert Systems with Applications, 2012,39(5): 6078-6088.
[30]	YU L , LIU H . Efficient feature selection via analysis of relevance and redundancy[J]. Journal of Machine Learning Research, 2004,5(12): 1205-1224.
[31]	TAPIA E , BULACIO P , ANGELONE L F . Sparse and stable gene selection with consensus SVM-RFE[J]. Pattern Recognition Letters, 2012,33(2): 164-172.
[32]	UNLER A , MURAT A , CHINNAM R B . mr2PSO:a maximum relevance minimum redundancy feature selection method based on swarm intelligence for support vector machine classification[J]. Information Sciences, 2011(20): 4625-4641.
[33]	CHE J X , YANG Y L , LI L ,et al. Maximum relevance minimum common redundancy feature selection for nonlinear data[J]. Information Sciences, 2017(5): 68-89.
[34]	CHAKRABORTY R , PAL N R . Feature selection using a neural framework with controlled redundancy[J]. IEEE Transactions on Neural Networks and Learning Systems, 2015,26(1): 35-50.
[35]	AKADI A E , OUARDIGHI A , ABOURAJDINE D . A powerful feature selection approach based on mutual information[J]. International Journal of Computer Science and Network Security, 2008(8): 116-211.
[36]	FLEURET F . Fast binary feature selection with conditional mutual information[J]. Journal of Machine Learning Research, 2004(5): 1531-1555.
[37]	NIU X T . Support vector extracted algorithm based on KNN and 10 fold cross-validation method[J]. Journal of Huazhong Normal University, 2014,48(3): 335-338.

序号	数据集	样本数/个	特征数/个	分类/个
1	heart	270	14	2
2	dermatology	358	35	6
3	movement_libra	360	91	15
4	wdbc	569	31	2
5	arrhythmia	416	279	12
6	musk	476	167	2
7	mfeat-kar	2 000	65	10
8	mushroom	8 124	22	2
9	kr-vs-kp	3 195	37	2

序号	FullSet算法	JMMC算法		IG算法		FCBF算法		ReliefF算法
序号	准确度	特征	准确度	特征	准确度	特征	准确度	特征	准确度
1	67.037%	8	71.111%	8	66.296%	8	66.296%	6	66.296%
2	96.013%	31	96.846%	28	91.027%	34	90.455%	24	97.171%
3	80%	87	80.666%	85	79.222%	85	79.222%	61	79.555%
4	92.633%	16	94.024%	26	92.633%	26	92.633%	7	92.633%
平均值	83.921%	35.5	85.662%	36.75	82.294%	38.25	82.151%	24.5	83.914%

序号	FullSet算法	JMMC算法		IG算法		FCBF算法		ReliefF算法
序号	准确度	特征	准确度	特征	准确度	特征	准确度	特征	准确度
1	76.296%	8	78.148%	7	76.296%	12	74.814%	11	75.925%
2	94.735%	26	96.124%	25	94.146%	32	93.297%	29	95.029%
3	68.444%	74	67.777%	53	66.666%	67	67.333%	27	68.444%
4	95.099%	6	95.437%	16	94.384%	16	94.739%	23	94.212%
平均值	83.643%	28.5	84.371%	25.25	82.873%	31.75	82.545%	22.5	83.402%

序号	FullSet算法	JMMC算法		IG算法		FCBF算法		ReliefF算法
序号	准确度	特征	准确度	特征	准确度	特征	准确度	特征	准确度
1	70.37%	8	71.851%	5	59.259%	1	57.407%	2	55.925%
2	97.434%	25	97.743%	24	93.169%	34	92.645%	31	97.728%
3	52.666%	61	53.555%	89	51.111%	89	51.111%	84	53.555%
4	88.973%	10	94.569%	5	89.111%	5	89.111%	1	69.197%
平均值	77.361%	26	79.429%	30.75	73.162%	32.25	72.568%	29.5	69.101%

序号	FullSet算法	JMMC算法		IG算法		FCBF算法		ReliefF算法
序号	准确度	特征	准确度	特征	准确度	特征	准确度	特征	准确度
1	68.765 %	4	79.51 %	6	66.173 %	5	71.111%	6	66.42 %
2	95.722%	21	96.952 %	34	92.416%	22	96.692%	23	93.522%
3	65.888%	81	66.777%	85	65.148%	87	64.962%	82	66.037%
4	87.301%	7	93.926 %	5	91.107 %	8	90.288%	5	91.284%
平均值	79.419%	28.25	84.291%	32.5	78.711%	30.5	80.763%	29	79.316%