电信科学 ›› 2018, Vol. 34 ›› Issue (10): 85-95.doi: 10.11959/j.issn.1000-0801.2018250

• 研究与开发 • 上一篇    下一篇

基于改进特征选择方法的文本情感分类研究

刘洺辛1,2,陈晶1,3,王麒媛1   

  1. 1 燕山大学信息科学与工程学院,河北 秦皇岛 066004
    2 河北省信息传输与信号处理重点实验室,河北 秦皇岛 066004
    3 河北省计算机虚拟技术实验室,河北 秦皇岛 066004
  • 修回日期:2018-09-09 出版日期:2018-10-01 发布日期:2018-11-08
  • 作者简介:刘洺辛(1976-),男,博士,燕山大学信息科学与工程学院教授,河北省信息传输与信号处理重点实验室硕士生导师,主要研究方向为物联网、无线传感器网络。|陈晶(1976-),女,博士,燕山大学信息科学与工程学院副教授,河北省计算机虚拟技术实验室硕士生导师,主要研究方向为Web服务、社交网络情感分析。|王麒媛(1991-),女,燕山大学信息科学与工程学院硕士生,主要研究方向为无线传感器网络、情感分析。
  • 基金资助:
    国家自然科学基金资助项目(61602401);国家自然科学基金资助项目(61472340)

Research on text sentiment classification based on improved feature selection method

Mingxin LIU1,2,Jing CHEN1,3,Qiyuan WANG1   

  1. 1 College of Information Science and Engineering,Yanshan University,Qinhuangdao 066004,China
    2 Hebei Key Laboratory of Information Transmission and Signal Processing,Qinhuangdao 066004,China
    3 Computer Virtual Technology Laboratory in Hebei Province,Qinhuangdao 066004,China
  • Revised:2018-09-09 Online:2018-10-01 Published:2018-11-08
  • Supported by:
    The National Natural Science Foundation of China(61602401);The National Natural Science Foundation of China(61472340)

摘要:

提出了结合情感词典的改进信息增益特征选择方法。首先,针对现有的信息增益特征选择存在注重特征词的文档频率而忽视语料均衡等问题,提出了改进方法。其次,考虑情感词对文本分类的影响,提出了基于情感词典的特征选择(information gain combining sentiment classification,IGSC)算法进行文本分类。该算法通过对文本情感词进行匹配并结合情感词赋权重,实现了特征降维并解决了文本数据稀疏影响分类性能的问题;最后,针对旅游评论数据集对所提出的特征选择方法进行了实验验证及分析。实验结果表明,本文提出的改进文本情感分类特征选择方法在分类准确率、召回率和F值方面均得到了提升,并且具有较好的分类稳定性。

关键词: 信息增益, 情感词典, 特征选择, 情感分类

Abstract:

An improved information gain feature selection method based on sentiment dictionary was proposed.Firstly,aiming at the existing problems of information gain feature selection,such as paying attention to the frequency of feature word and ignoring the balance of corpus,an improved method was proposed.Secondly,considering the influence of sentiment words in text classification,a feature selection method IGSC (information gain combining sentiment classification) based on sentiment dictionary was proposed for text classification.By matching the text emotion words and combining the weight of emotion words,the feature dimension reduction was realized and the problem of text data sparseness affecting classification performance was solved.Finally,according to the proposed feature selection method of travel review data set for experimental verification and analysis,the experimental results show that the improved text sentiment classification feature selection method has been improved in terms of classification accuracy,recall and F value,and classification has better stability.

Key words: information gain, sentiment dictionary, feature selection, sentiment classification

中图分类号: 

No Suggested Reading articles found!