通信学报 ›› 2019, Vol. 40 ›› Issue (10): 101-108.doi: 10.11959/j.issn.1000-436x.2019154

• 学术论文 • 上一篇    下一篇

基于XGBoost的特征选择算法

李占山1,2,3, 刘兆赓2,3()   

  1. 1 吉林大学计算机科学与技术学院,吉林 长春 130012
    2 吉林大学软件学院,吉林 长春 130012
    3 吉林大学符号计算与知识工程教育部重点实验室,吉林 长春 130012
  • 修回日期:2019-04-04 出版日期:2019-10-25 发布日期:2019-11-07
  • 作者简介:李占山(1966- ),男,吉林长春人,博士,吉林大学教授、博士生导师,主要研究方向为约束优化与约束求解、机器学习、基于模型的诊断、智能规划与调度等。|刘兆赓(1993- ),男,吉林吉林人,吉林大学硕士生,主要研究方向为机器学习。
  • 基金资助:
    国家自然科学基金资助项目(6167226);吉林省自然科学基金资助项目(2018010143JC);吉林省发改委产业技术研究与开发专项基金资助项目(2019C053-9)

Feature selection algorithm based on XGBoost

Zhanshan LI1,2,3, Zhaogeng LIU2,3()   

  1. 1 College of Computer Science and Technology,Jilin University,Changchun 130012,China
    2 College of Software,Jilin University,Changchun 130012,China
    3 Key Laboratory of Symbolic Computation and Knowledge Engineering,Ministry of Education,Jilin University,Changchun 130012,China
  • Revised:2019-04-04 Online:2019-10-25 Published:2019-11-07
  • Supported by:
    The National Natural Science Foundation of China(6167226);The Natural Science Foundation of Jilin Province(2018010143JC);Industrial Technology Research and Development Special Project of Jilin Province Development and Reform Commission(2019C053-9)

摘要:

分类问题中的特征选择一直是一个重要而又困难的问题。这类问题中要求特征选择算法不仅能够帮助分类器提高分类准确率,同时还要尽可能地减少冗余特征。因此,为了在分类问题中更好地进行特征选择,提出了一种新型的包裹式特征选择算法XGBSFS。该算法借鉴极端梯度提升(XGBoost)算法中构建树的思想过程,通过从3个重要性度量的角度来衡量特征的重要性,避免单一重要性度量的局限性;然后通过改进的序列浮动前向搜索策略(ISFFS)搜索特征子集,使最终得到的特征子集有较高的质量。在8个UCI数据集的对比实验中表明,所提算法具有很好的性能。

关键词: 特征选择, 极端梯度提升, 序列浮动搜索策略

Abstract:

Feature selection in classification has always been an important but difficult problem.This kind of problem requires that feature selection algorithms can not only help classifiers to improve the classification accuracy,but also reduce the redundant features as much as possible.Therefore,in order to solve feature selection in the classification problems better,a new wrapped feature selection algorithm XGBSFS was proposed.The thought process of building trees in XGBoost was used for reference,and the importance of features from three importance metrics was measured to avoid the limitation of single importance metric.Then the improved sequential floating forward selection (ISFFS) was applied to search the feature subset so that it had high quality.Compared with the experimental results of eight datasets in UCI,the proposed algorithm has good performance.

Key words: feature selection, XGBoost, sequential floating forward selection

中图分类号: 

No Suggested Reading articles found!