电信科学 ›› 2020, Vol. 36 ›› Issue (11): 47-60.doi: 10.11959/j.issn.1000-0801.2020302
顾秋阳1,2,吴宝1,2,琚春华3
修回日期:
2020-11-10
出版日期:
2020-11-20
发布日期:
2020-12-09
作者简介:
顾秋阳(1995- ),男,浙江工业大学博士生,主要研究方向为智能信息处理、数据挖掘、电子商务与物流优化等|吴宝(1979- ),男,博士,浙江工业大学教授、博士生导师,主要研究方向为社会网络、企业社会责任与高质量发展等|琚春华(1962- ),男,博士,浙江工商大学教授、博士生导师,主要研究方向为智能信息处理、数据挖掘、电子商务与物流优化等
Qiuyang GU1,2,Bao WU1,2,Chunhua JU3
Revised:
2020-11-10
Online:
2020-11-20
Published:
2020-12-09
摘要:
近年社交网络用户数量不断增加,基于文本的用户情感分析技术得到普遍关注和应用。但数据稀疏性、精度较低等问题往往会降低情感识别方法的精度和速度,提出了用户情感Biterm主题模型(US-BTM),从特定场所的文本中发现用户偏好及情感倾向,有效利用Biterm进行主题建模,并使用聚合策略形成伪文档,为整个文本集创建词汇配对以解决数据稀疏性和短文本等问题。通过词汇共现算法对主题进行研究,推断文本集级别信息的主题,并通过分析特定场景下的评论文本集中的词汇配对集及其相应主题的情感,达到准确预测用户对特定场景的兴趣、偏好和情感的目的。结果证明,所提方法能准确地捕捉用户的情感倾向,正确地揭示用户偏好,可广泛应用于社交网络的内容描述、推荐及社交网络用户兴趣描述、语义分析等多个领域。
中图分类号:
顾秋阳,吴宝,琚春华. 融入词汇共现的社交网络用户情感Biterm主题模型[J]. 电信科学, 2020, 36(11): 47-60.
Qiuyang GU,Bao WU,Chunhua JU. Biterm topic model of social network users’ sentiment by integrating word co-occurrence[J]. Telecommunications Science, 2020, 36(11): 47-60.
表1
社交网络用户情感Biterm主题模型参数设置"
参数名称 | 参数符号 | 参数描述 |
位置集 | V | 位置v的集合V |
用户集 | U | 用户的集合 |
主题数量 | Y | 文本中所包含的主题总数 |
情感标签 | S | 用户情感标签 |
特定位置进行评论的用户集 | Av | 在位置v中进行过评论的用户集合A |
特定位置的文本集 | dv | 在位置v中的所有文本的集合d |
主题变量 | z | 主题中存在的变量z |
用户变量 | u | 用户变量u |
开关变量 | c | 开关变量c |
特定位置的文本集中的词汇数 | Nd | 特定位置v中的所有文本的集合d中的词汇数量 |
位置集的主题分布 | θv | 位置集V中的主体分布θ |
主题的词汇分布 | ?z | 主题z中的词汇分布? |
用户的情感分布 | Xa | 用户u的情感分布X |
主题的情感分布 | θa | 主题的情感分布θa |
位置中的情感分布 | Xv | 在位置v中的情感分布X |
狄利克雷先验概率 | α ,β,σ,η | 狄利克雷先验概率参数 |
伯努利先验概率 | γ | 伯努利先验概率参数 |
词对集 | wi,wj | 词对集合 |
词对数量 | M | 词对数量 |
[1] | 熊蜀峰, 姬东鸿 . 面向产品评论分析的短文本情感主题模型[J]. 自动化学报, 2016,42(8): 1227-1237. |
XIONG S F , JI D H . A short text sentiment-topic model for product review analysis[J]. Acta Automatica Sinica, 2016,42(08): 1227-1237. | |
[2] | ZUO Y , WU J , ZHANG H ,et al. Topic modeling of short texts:a pseudo-document view[C]// Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. New York:ACM Press, 2016: 2105-2114. |
[3] | DEVI G U , PRIVAN M K , GOKULNATH C . Wireless camera network with enhanced SIFT algorithm for human tracking mechanism[J]. International Journal of Internet Technology and Secured Transactions, 2018,8(2): 185-194. |
[4] | 刘洺辛, 陈晶, 王麒媛 . 基于改进特征选择方法的文本情感分类研究[J]. 电信科学, 2018,34(10): 85-95. |
LIU M X , CHEN J , WANG Q Y . Research on text sentiment classification based on improved feature selection method[J]. Telecommunications Science, 2018,34(10): 85-95. | |
[5] | CHENG X , YAN X , LAN Y ,et al. Btm:topic modeling over short texts[J]. IEEE Transactions on Knowledge and Data Engineering, 2014,26(12): 2928-2941. |
[6] | VIJAYAKUMAR V , PRIVAN M K , USHADEVI G ,et al. E-health cloud security using timing enabled proxy re-encryption[J]. Mobile Networks and Applications, 2019,24(3): 1034-1045. |
[7] | 蔡永明, 长青 . 共词网络LDA模型的中文短文本主题分析[J]. 情报学报, 2018,37(3): 305-317. |
CAI Y M , CHANG Q . Chinese short text topic analysis by latent dirichlet allocation model with co-word network analysis[J]. Journal of the China Society for Scientific and Technical Information, 2018,37(3): 305-317. | |
[8] | LI W , FENG Y , LI D ,et al. Micro-blog topic detection method based on BTM topic model and K-means clustering algorithm[J]. Automatic Control and Computer Sciences, 2016,50(4): 271-277. |
[9] | 周孟, 朱福喜 . 基于情感标签的极性分类[J]. 电子学报, 2017,45(4): 1018-1024. |
ZHOU M , ZHU F X . Polarity classification based on sentiment tags[J]. Acta Electronica Sinica, 2017,45(4): 1018-1024. | |
[10] | 欧阳继红, 刘燕辉, 李熙铭 ,等. 基于 LDA 的多粒度主题情感混合模型[J]. 电子学报, 2015,43(9): 1875-1880. |
OUYANG J H , LIU Y H , LI X M ,et al. Multi-grain sentiment topic ,model based on LDA[J]. Acta Electronica Sinica, 2015,43(9): 1875-1880. | |
[11] | LI C , ZHANG J , SUN J T ,et al. Sentiment topic model with decomposed prior[C]// Proceedings of the 2013 SIAM International Conference on Data Mining.Society for Industrial and Applied Mathematics.[S.n.:s.l]. 2013: 767-775. |
[12] | MEI Q , LING X , WONDRA M ,et al. Topic sentiment mixture:modeling facets and opinions in weblogs[C]// Proceedings of the 16th international conference on World Wide Web. New York:ACM Press, 2007: 171-180. |
[13] | 孙锐, 郭晟, 姬东鸿 . 融入事件知识的主题表示方法[J]. 计算机学报, 2017,40(4): 791-804. |
SUN R , GUO S , JI D H . Topic representation integrated with event knowledge[J]. Chinese Journal of Computers, 2017,40(4): 791-804. | |
[14] | PRIVAN M K , DEVI G U . A survey on internet of vehicles:applications,technologies,challenges and opportunities[J]. International Journal of Advanced Intelligence Paradigms, 2019,12(1-2): 98-119. |
[15] | BALAN E V , PRIVAN M K , NATH C G ,et al. Efficient energy scheme for wireless sensor network application[C]// Proceedings of 2014 IEEE International Conference on Computational Intelligence and Computing Research. Piscataway:IEEE Press, 2014: 1-5. |
[16] | BICALHO P , PITA M , PEDROSA G ,et al. A general framework to expand short text for topic modeling[J]. Information Sciences, 2017(393): 66-81. |
[17] | NGUYEN T S , LAUW H W , TSAPARAS P . Review synthesis for micro-review summarization[C]// Proceedings of the eighth ACM International Conference on Web Search and Data Mining. New York:ACM Press, 2015: 169-178. |
[18] | LIN C , HE Y . Joint sentiment/topic model for sentiment analysis[C]// Proceedings of the 18th ACM Conference on Information and Knowledge Management. New York:ACM Press, 2009: 375-384. |
[19] | 朱宪莹, 刘箴, 金炜 ,等. 基于特征融合的层次结构微博情感分类[J]. 电信科学, 2016,32(7): 106-114. |
ZHU X Y , LIU Z , JIN W ,et al. Hierarchical micro-blog sentiment classification based on feature fusion[J]. Telecommunications Science, 2016,32(7): 106-114 | |
[20] | KUMAR P M , DEVI U , MANOGARAN G ,et al. Ant colony optimization algorithm with Internet of vehicles for intelligent traffic control system[J]. Computer Networks, 2018(144): 154-162. |
[21] | LIU S M , CHEN J H . A multi-label classification-based approach for sentiment classification[J]. Expert Systems with Applications, 2015,42(3): 1083-1093. |
[22] | MANOGARAN G , SHAKEEL P M , HASSANEIN A S ,et al. Machine learning approach-based gamma distribution for brain tumor detection and data sample imbalance analysis[J]. IEEE Access, 2018,7(1): 12-19. |
[23] | 王建成, 徐扬, 刘启元 ,等. 基于神经主题模型的对话情感分析[J]. 中文信息学报, 2020,34(1): 106-112. |
WANG J C , XU Y , LIU Q Y ,et al. Dialog sentiment analysis with neural topic model[J]. Journal of Chinese Information Processing, 2020,34(1): 106-112. | |
[24] | 张佳明, 王波, 唐浩浩 ,等. 基于 Biterm 主题模型的无监督微博情感倾向性分析[J]. 计算机工程, 2015,41(7): 219-223,229. |
ZHANG J M , WANG B , TANG H H ,et al. Unsupervised sentiment orientation analysis on micro-blog based on biterm topic model[J]. Computer Engineering, 2015,41(7): 219-223,229. | |
[25] | ROSEN-ZVI M , GRIFFITHS T , STEVVERS M ,et al. The author-topic model for authors and documents[C]// Proceedings of the 20th Conference on Uncertainty in Artificial Intelligence. Barcelona:AUAI Press, 2004: 487-494. |
[26] | 琚春华, 鲍福光, 戴俊彦 . 一种融入公众情感投入分析的微博话题发现与细分方法[J]. 电信科学, 2016,32(7): 97-105. |
JU C H , BAO F G , DAI J Y . Discovery and segmentation method in micro-blog topics based on public emotional engagement analysis[J]. Telecommunications Science, 2016,32(7): 97-105. | |
[27] | 阮光册, 夏磊 . 基于词共现关系的检索结果知识关联研究[J]. 情报学报, 2017,36(12): 1247-1254. |
RUAN G C , XIA L . Knowledge connection of retrieval results based on co-word analysis[J]. Journal of the China Society for Scientific and Technical Information, 2017,36(12): 1247-1254. | |
[28] | MUKHERJEE S , BASU G , JOSHI S . Joint author sentiment topic model[C]// Proceedings of the 2014 SIAM International Conference on Data Mining.[S.n.:s.l]. 2014: 370-378. |
[29] | 张雄, 陈福才, 黄瑞阳 . 基于双词主题模型的半监督实体消歧方法研究[J]. 电子学报, 2018,46(3): 607-613. |
ZHANG X , CHEN F C , HUANG R Y . Semi-supervised entity disambiguation method research based on biterm topic model[J]. Acta Electronica Sinica, 2018,46(3): 607-613. | |
[30] | JO Y , OH A H . Aspect and sentiment unification model for online review analysis[C]// Proceedings of the Fourth ACM International Conference on Web Search and Data Mining. New York:ACM Press, 2011: 815-824. |
[31] | 寇晓淮, 程华 . 基于主题模型的垃圾邮件过滤系统的设计与实现[J]. 电信科学, 2017,33(11): 73-82. |
KOU X H , CHENG H . Design and implementation of spam filtering system based on topic model[J]. Telecommunications Science, 2017,33(11): 73-82. | |
[32] | 张树森, 梁循, 弭宝瞳 ,等. 基于内容的社交网络用户身份识别方法[J]. 计算机学报, 2019,42(8): 1739-1754. |
ZHANG S S , LIANG X , MI B T ,et al. Content-based social network user identification methods[J]. Chinese Journal of Computers, 2019,42(8): 1739-1754. | |
[33] | 刘啸剑, 谢飞, 吴信东 . 基于图和LDA主题模型的关键词抽取算法[J]. 情报学报, 2016,35(6): 664-672. |
LIU X J , XIE F , WU X D . Graph based keyphrase extraction using LDA topic model[J]. Journal of the China Society for Scientific and Technical Information, 2016,35(6): 664-672. | |
[34] | CHAN P P K , YANG C , YEUNG D S ,et al. Spam filtering for short messages in adversarial environment[J]. Neurocomputing, 2015,155(C): 167-176. |
[35] | BOYD-GRABER J , BLEI D . Syntactic topic models[J]. Advances in Neural Information Processing Systems, 2010(2): 185-192. |
[36] | BLEID M , ANDREWY N G , JORDAN M I . Latent dirichlet allocation[J]. Journal of Machine Learning Research, 2003,3(1): 993-1022. |
[37] | XIA Y , TAN G , HUSSIAN A ,et al. Discriminative biterm topic model for headline-based social news clustering[C]// Proceedings of the Twenty-Eighth International Florida Artificial Intelligence Research Society Conference. Piscataway:IEEE Press, 20083(1): 32-45. |
[38] | 李思宇, 谢珺, 邹雪君 ,等. 基于双词语义扩展的 Biterm 主题模型[J]. 计算机工程, 2019,45(1): 210-216. |
LI S Y , XIE J , ZOU X J ,et al. Biterm topic model based on semantic extension of double words[J]. Computer Engineering, 2019,45(1): 210-216. | |
[39] | 黄畅, 郭文忠, 郭昆 . 面向微博热点话题发现的改进 BBTM模型研究[J]. 计算机科学与探索, 2019,13(7): 1102-1113. |
HUANG C , GUO W Z , GUO K . Research on improved BBTM model for microblog hot topic discovery[J]. Journal of Frontiers of Computer Science and Technology, 2019,13(7): 1102-1113. | |
[40] | 郑承利, 姚银红 . 基于高阶一致风险测度的组合优化[J]. 系统管理学报, 2017,26(5): 857-868. |
ZHENG C L , YAO Y H . Portfolio optimization based on higher moment coherent risk measure[J]. Journal of Systems & Management, 2017,26(5): 857-868. | |
[41] | 崔雪莲, 那日萨, 刘晓君 . 基于主题相似性的在线评论情感分析[J]. 系统管理学报, 2018,27(5): 821-827. |
CUI X L , NA R S , LIU X J . Sentiment analysis of online reviews based on topic similarity[J]. Journal of Systems & Management, 2018,27(5): 821-827. |
[1] | 徐海勇,陶涛,黄岩,唐崔巍,张兆静,吴晶. 基于社交网络分析的流量红包客户挖掘与传播模式[J]. 电信科学, 2020, 36(8): 139-150. |
[2] | 张科,孙越佳,韩海. 基于协同过滤与社交网络混合算法的客户信用建模及授信方法[J]. 电信科学, 2020, 36(2): 52-60. |
[3] | 顾秋阳,琚春华,吴功兴. 融入用户合作与领导激励的社交网络知识传播模型[J]. 电信科学, 2020, 36(10): 172-182. |
[4] | 李攀攀,谢正霞,乐光学,马柏林,陈丽,刘建华. SSDHT:基于社交网络的DHT安全增强机制[J]. 电信科学, 2018, 34(11): 10-20. |
[5] | 王莉,冯志勇,张平. 面向分布式缓存系统的无线资源管理:动机、挑战与方法[J]. 电信科学, 2017, 33(3): 83-94. |
[6] | 廖建新. 大数据技术的应用现状与展望[J]. 电信科学, 2015, 31(7): 1-12. |
[7] | 王瑞琴,潘俊,李一啸. 基于多社交数据源的协同推荐方法研究[J]. 电信科学, 2015, 31(6): 68-74. |
[8] | 亓晋,许斌,胡筱旋,徐匾珈,肖星琳. 基于用户行为认知的在线社交网络协同推荐[J]. 电信科学, 2015, 31(10): 108-114. |
[9] | 琚春华,黄治移,鲍福光. 融入音乐子人格特质和社交网络行为分析的音乐推荐方法[J]. 电信科学, 2015, 31(10): 115-123. |
[10] | 沈士根,黄龙军,胡珂立,李宏杰,韩日昇,曹奇英. 基于微分博弈的在线社交网络恶意程序传播优化控制方法[J]. 电信科学, 2015, 31(10): 66-73. |
[11] | 顾磊,王艺. 基于政府数据开放的智慧城市构建 *[J]. 电信科学, 2014, 30(11): 38-43. |
[12] | 孙晓晨,徐雅斌. 位置社交网络的潜在好友推荐模型研究[J]. 电信科学, 2014, 30(10): 71-77. |
[13] | 陈康,黄晓宇,王爱宝,陶彩霞,关迎晖,李磊. 基于位置信息的用户行为轨迹分析与应用综述[J]. 电信科学, 2013, 29(4): 118-124. |
[14] | 漆晨曦. 电信企业大数据分析、应用及管理发展策略[J]. 电信科学, 2013, 29(3): 12-16. |
[15] | 陈志竞,梁伯瀚. 数据挖掘助力精细化流量经营[J]. 电信科学, 2012, 28(7): 1-5. |
阅读次数 | ||||||
全文 |
|
|||||
摘要 |
|
|||||
|