电信科学 ›› 2011, Vol. 27 ›› Issue (6): 43-48.doi: 10.3969/j.issn.1000-0801.2011.06.010

• 专题:移动电子商务 • 上一篇    下一篇

商家评论的情感分类研究和应用

袁立宇1,鞠久朋2,杨豪杰1,宋平波1   

  1. 1 中国电信股份有限公司广东研究院 广州 510630
    2 海量信息技术有限公司 北京 100190
  • 出版日期:2011-06-15 发布日期:2011-06-15

Feature Weighting for Sentiment Classification of Online Chinese Reviews

Liyu Yuan1,Jiupeng Ju2,Haojie Yang1,Pingbo Song1   

  1. 1 Guangdong Research Institute of China Telecom Co.,Ltd.,Guangzhou 510630,China
    2 Hylanda Information Technology Co.,Ltd.,Beijing 100190,China
  • Online:2011-06-15 Published:2011-06-15

摘要:

大多数基于有指导机器学习方法的情感分类采用N 元(n-gram)词袋(bag-of-words)模型,使用二值(binary)作为特征项的权重。本文系统地分析了信息检索中常用的特征权重计算方法,并从项频、倒文档率、归一化因子等角度加以借鉴和改进,研究其在商家评论上的应用。最主要的改进在于考虑了特征项在不同类别中分布情况的差异以及对倒文档率的平滑。在餐饮评论语料上的实验结果表明,经典的tf·idf若干变形,尤其是倒文档率类差异(delta idf)及平滑因子(smoothing factor)的引入,能有效提高分类准确率。在酒店、电脑、书籍等领域的在线评论公开数据集上也取得了较好的性能,证明了方法的普遍适用性。这一方法目前已经在中国电信“号码百事通”业务中用于餐饮商家及优惠券推荐,效果良好。

关键词: 商家评论, 消费偏好, 情感分析, 褒贬分类, 特征权重

Abstract:

Most supervised machine learning method based sentiment classifications apply binary n-gram weights.In this paper, we systematically explore whether more sophisticated feature weighting schemes adapted from information retrieval(IR)can enhance the accuracy of sentiment classification for business reviews.Considered points of view are term frequency(tf),delta inverse document frequency(idf),and smoothing factor.Using restaurant reviews from the number wizard service created by China Telecom as experimental data show that,variants of the classic tf·idf scheme,especially incorporating of delta idf and smoothing factors,provide significant increases in accuracy.Tests on multi-domain public data sets indicate the universality of our approach.The proposed method has been implemented as effective application of restaurant recommendation system on China Telecom Number Wizard micro-blog.

Key words: business review, consumer preference, sentiment analysis, polarity classification, feature weighting

No Suggested Reading articles found!