通信学报 ›› 2018, Vol. 39 ›› Issue (5): 111-122.doi: 10.11959/j.issn.1000-436x.2018082

• 学术论文 • 上一篇    下一篇

基于最大相关最小冗余联合互信息的多标签特征选择算法

张俐1,2,王枞1,2   

  1. 1 北京邮电大学软件学院,北京 100876
    2 北京邮电大学可信分布式计算与服务教育部重点实验室,北京 100876
  • 修回日期:2018-04-18 出版日期:2018-05-01 发布日期:2018-06-01
  • 作者简介:张俐(1977-),男,陕西汉中人,北京邮电大学博士生,主要研究方向为机器学习、特征工程、医疗健康数据分析挖掘。|王枞(1958-),女,北京人,博士,北京邮电大学教授、博士生导师,主要研究方向为智能信息处理、网络信息安全、可信计算与服务、医疗健康数据分析挖掘。
  • 基金资助:
    国家科技基础性工作专项基金资助项目(2015FY111700-6)

Multi-label feature selection algorithm based on joint mutual information of max-relevance and min-redundancy

Li ZHANG1,2,Cong WANG1,2   

  1. 1 School of Software,Beijing University of Posts and Telecommunications,Beijing 100876,China
    2 Key Laboratory of Trustworthy Distributed Computing and Service of Ministry of Education,Beijing University of Posts and Telecommunications,Beijing 100876,China
  • Revised:2018-04-18 Online:2018-05-01 Published:2018-06-01
  • Supported by:
    The National Science and Technology Basic Work Project(2015FY111700-6)

摘要:

在过去的几十年中,特征选择已经在机器学习和人工智能领域发挥着重要作用。许多特征选择算法都存在着选择一些冗余和不相关特征的现象,这是因为它们过分夸大某些特征重要性。同时,过多的特征会减慢机器学习的速度,并导致分类过渡拟合。因此,提出新的基于前向搜索的非线性特征选择算法,该算法使用互信息和交互信息的理论,寻找与多分类标签相关的最优子集,并降低计算复杂度。在UCI中9个数据集和4个不同的分类器对比实验中表明,该算法均优于原始特征集和其他特征选择算法选择出的特征集。

关键词: 特征选择, 条件互信息, 特征交互, 特征相关, 特征冗余

Abstract:

Feature selection has played an important role in machine learning and artificial intelligence in the past decades.Many existing feature selection algorithm have chosen some redundant and irrelevant features,which is leading to overestimation of some features.Moreover,more features will significantly slow down the speed of machine learning and lead to classification over-fitting.Therefore,a new nonlinear feature selection algorithm based on forward search was proposed.The algorithm used the theory of mutual information and mutual information to find the optimal subset associated with multi-task labels and reduced the computational complexity.Compared with the experimental results of nine datasets and four different classifiers in UCI,the proposed algorithm is superior to the feature set selected by the original feature set and other feature selection algorithms.

Key words: feature selection, conditional mutual information, feature interaction, feature relevance, feature redundancy

中图分类号: 

No Suggested Reading articles found!