知识增强策略引导的交互式强化推荐系统

doi:10.11959/j.issn.2096-0271.2022033

大数据 ›› 2022, Vol. 8 ›› Issue (5): 88-105.doi: 10.11959/j.issn.2096-0271.2022033

知识增强策略引导的交互式强化推荐系统

张宇奇¹^,², 黄晓雯¹^,², 桑基韬¹^,²

¹ 北京交通大学计算机与信息技术学院，北京 100044
² 交通数据分析与挖掘北京市重点实验室，北京 100044

出版日期:2022-09-15 发布日期:2022-09-01
作者简介:张宇奇（1997- ），男，北京交通大学计算机与信息技术学院硕士生，主要研究方向为强化学习、推荐系统等
黄晓雯（1993- ），女，博士，北京交通大学计算机与信息技术学院讲师，主要研究方向为多媒体计算、数据挖掘、用户建模、推荐系统等，在国内外学术会议/期刊上发表学术论文10余篇
桑基韬（1985- ），男，博士，北京交通大学计算机与信息技术学院教授。2017年入选北京交通大学“卓越百人”计划。曾获中国电子学会科学技术奖自然科学一等奖、北京市科学技术奖、中国科学院院长特别奖、ACM中国新星奖等。主要研究方向为社会多媒体计算、多源数据挖掘、可信赖机器学习等。作为负责人先后主持国家自然科学基金重点项目、国家重点研发计划课题、北京市杰出青年科学基金等多个项目
基金资助:
中央高校基本科研专项资金资助项目(2021RC217)

Knowledge-enhanced policy-guided interactive reinforcement recommendation system

Yuqi ZHANG¹^,², Xiaowen HUANG¹^,², Jitao SANG¹^,²

¹ School of Computer and Information Technology, Beijing Jiaotong University, Beijing 100044, China
² Beijing Key Lab of Traffic Data Analysis and Mining, Beijing 100044, China

Online:2022-09-15 Published:2022-09-01
Supported by:
The Fundamental Research Funds for the Central Universities(2021RC217)

摘要/Abstract

摘要：

推荐系统是解决社会媒体信息过载问题的重要手段。为了解决传统推荐系统无法优化用户长期体验的问题，研究人员提出了交互式推荐系统，并尝试使用深度强化学习优化推荐策略。但是，强化推荐算法面临反馈稀疏、从零学习影响用户体验、物品空间大等问题。为了解决上述问题，提出一种改进的知识增强策略引导的交互式强化推荐模型KGP-DQN。该模型构建行为知识图谱表示模块，将用户历史行为和知识图谱结合，解决反馈稀疏问题；构建策略初始化模块，根据用户历史行为为强化推荐系统提供初始化策略，解决从零学习影响用户体验的问题；构建候选集筛选模块，根据行为知识图谱上的物品表示进行动态聚类，从而减少物品空间，解决动作空间大的问题。在3个真实数据集上进行了实验，实验结果表明，KGP-DQN可以快速有效地对强化推荐系统进行训练，其在3个数据集上的推荐准确率均超过80%。

Abstract:

The recommendation system is an important means to solve the problem of information overload in social media.To solve the problem that traditional recommendation systems cannot optimize the longterm user experience, researchers have proposed the interactive recommendation system and tried to use deep reinforcement learning to optimize the strategy of recommendation.However, the reinforcement recommendation algorithm faces problems such as sparse feedback, learning from zero which damages the user experience, and large item space.To solve the above problems, an improved interactive reinforcement recommendation model KGP-DQN was proposed.The model constructed a behavioral knowledge graph representation module, which combines user historical behavior and knowledge graph to solve the problem of sparse feedback.The model constructed a strategy initialization module to provide an initialization strategy for the reinforcement recommendation system based on user historical behaviors to solve the problem of learning from zero.The model constructed the candidate select module which creates candidates by dynamic clustering based on the item representation on the behavioral knowledge graph to solve the problem of large action space.The experiments were conducted on three real-world datasets.The experimental results show that the KGP-DQN method can quickly and effectively train the reinforcement recommendation system and its recommendation accuracy on three datasets is more than 80%.

Key words: interactive recommendation system, deep reinforcement learning, knowledge graph, policy initialization, candidate select

中图分类号:

TP391

张宇奇, 黄晓雯, 桑基韬. 知识增强策略引导的交互式强化推荐系统[J]. 大数据, 2022, 8(5): 88-105.

Yuqi ZHANG, Xiaowen HUANG, Jitao SANG. Knowledge-enhanced policy-guided interactive reinforcement recommendation system[J]. Big Data Research, 2022, 8(5): 88-105.

图/表 13

图1

图2

图3

图4

表1

表2

表3

表4

图5

图6

图7

表5

图8

参考文献 17

[1]	WANG H Z , WU Q Y , WANG H N . Factorization bandits for interactive recommendation[C]// Proceedings of the 31st AAAI Conference on Artificial Intelligence. Palo Alto:AAAI Press, 2017: 2695-2702.
[2]	MAHMOOD T , RICCI F . Learning and adaptivity in interactive recommender systems[C]// Proceedings of the 9th International Conference on Electronic Commerce. New York:ACM Press, 2007: 75-84.
[3]	ZHAO X Y , ZHANG L , DING Z Y ,et al. Recommendations with negative feedback via pairwise deep reinforcement learning[C]// Proceedings of the 24th ACM SIGKDD International Conference on Knowledge Discovery ＆ Data Mining. New York:ACM Press, 2018: 1040-1048.
[4]	ZHOU S J , DAI X Y , CHEN H K ,et al. Interactive recommender system via knowledge graph-enhanced reinforcement learning[C]// Proceedings of the 43rd International ACM SIGIR Conference on Research and Development in Information Retrieval. New York:ACM Press, 2020: 179-188.
[5]	WANG P F , FAN Y , XIA L ,et al. KERL:a knowledge-guided reinforcement learning model for sequential recommendation[C]// Proceedings of the 43rd International ACM SIGIR Conference on Research and Development in Information Retrieval. New York:ACM Press, 2020: 209-218.
[6]	GOLDBERG D , NICHOLS D , OKI B M ,et al. Using collaborative filtering to weave an information tapestry[J]. Communications of the ACM, 1992,35(12): 61-70.
[7]	RENDLE S , . Factorization machines[C]// Proceedings of 2010 IEEE International Conference on Data Mining. Piscataway:IEEE Press, 2010: 995-1000.
[8]	HIDASI B , KARATZOGLOU A , BALTRUNAS L ,et al. Session-based recommendations with recurrent neural networks[J]. arXiv preprint,2015,arXiv:1511.06939.
[9]	ZHOU G R , ZHU X Q , SONG C R ,et al. Deep interest network for click-through rate prediction[C]// Proceedings of the 24th ACM SIGKDD International Conference on Knowledge Discovery ＆ Data Mining. New York:ACM Press, 2018: 1059-1068.
[10]	CHEN H K , DAI X Y , CAI H ,et al. Large-scale interactive recommendation with tree-structured policy gradient[C]// Proceedings of the 33rd AAAI Conference on Artificial Intelligence. Palo Alto:AAAI Press, 2019: 3312-3320.
[11]	ZHENG G J , ZHANG F Z , ZHENG Z H ,et al. DRN:a deep reinforcement learning framework for news recommendation[C]// Proceedings of the 2018 World Wide Web Conference.[S.l.:s.n.], 2018: 167-176.
[12]	ZOU L X , XIA L , DING Z Y ,et al. Reinforcement learning to optimize longterm user engagement in recommender systems[C]// Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery ＆ Data Mining. New York:ACM Press, 2019: 2810-2818.
[13]	DULAC-ARNOLD G , EVANS R , SUNEHAG P ,et al. Reinforcement learning in large discrete action spaces[J]. arXiv preprint,2015,arXiv:1512.07679.
[14]	XIAN Y K , FU Z H , MUTHUKRISHNAN S ,et al. Reinforcement knowledge graph reasoning for explainable recommendation[C]// Proceedings of the 42nd International ACM SIGIR Conference on Research and Development in Information Retrieval. New York:ACM Press, 2019: 285-294.
[15]	HASSELT H V , GUEZ A , SILVER D . Deep reinforcement learning with double Q-learning[C]// Proceedings of the 30th AAAI Conference on Artificial Intelligence. Palo Alto:AAAI Press, 2016.
[16]	KOREN Y , BELL R , VOLINSKY C . Matrix factorization techniques for recommender systems[J]. Computer, 2009,42(8): 30-37.
[17]	DAVIES D L , BOULDIN D W . A cluster separation measure[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 1979,PAMI-1(2): 224-227.

符号	描述
	交互式推荐系统中的物品表示
	交互式推荐系统中的物品表示
	G 为知识图谱，E 为点信息，R为边信息
	G′为行为知识图谱，E′为点信息，R′为边信息
	O_i 为用户的历史交互行为，i为用户访问过的物品
	t时刻用户的表示
r_t	t时刻用户给推荐系统的奖励
I_t	t时刻推荐系统的候选集
θ_s	状态表示网络的参数
θ_q	推荐评价Q网络的参数
	目标Q网络的参数
D	记忆库

对比项		MovieLens-1m	Last.fm	Yelp
交互数据	用户数/个	5 417	1 801	27 675
	物品数/个	3 650	7 432	70 311
	属性数/个	15	33	590
	数据数/条	986 495	76 693	1 368 606
知识图谱	实体数/个	99 060	9 266	98 576
	三元组数/个	207 939	138 217	2 533 827

对比项	MovieLens-1m			Last.fm			Yelp
对比项	奖励	准确率	召回率	奖励	准确率	召回率	奖励	准确率	召回率
协同过滤	-0.07	51%	1.31%	0.05	62%	0.9%	-0.15	49%	0.2%
GRU4REC	0.15	68%	1.74%	0.20	72%	1.1%	0.22	61%	0.3%
DQNR	0.26	75%	1.92%	0.36	78%	1.2%	0.26	63%	0.3%
KGQN	0.37	77%	1.98%	0.45	84%	1.2%	0.40	82%	0.4%
KGP-DQN	0.47	82%	2.11%	0.53	87%	1.3%	0.47	91%	0.4%

对比项	MovieLens-1m		Last.fm		Yelp
对比项	奖励	准确率	奖励	准确率	奖励	准确率
KGP-DQN-kg	0.26 (-45%)	75% (-9%)	0.36 (-32%)	78% (-10%)	0.26 (-44%)	63% (-31%)
KGP-DQN-pg	0.37 (-21%)	78% (-5%)	0.45 (-15%)	84% (-3%)	0.39 (-17%)	85% (-7%)
KGP-DQN-cs	0.46 (-2%)	81% (-1%)	0.50 (-6%)	85% (-2%)	0.47 (-0%)	89% (-2%)
KGP-DQN	0.47	82%	0.53	87%	0.47	91%

推荐次数/次	MovieLens-1m	Last.fm	Yelp
0	2.71	3.74	4.97
100	2.43	3.62	4.91
200	2.26	3.56	4.60
300	2.15	3.54	4.05

知识增强策略引导的交互式强化推荐系统

Knowledge-enhanced policy-guided interactive reinforcement recommendation system

在线阅读

PDF下载

可视化

摘要/Abstract

引用本文

使用本文

图/表 13

参考文献 17

相关文章 15

Metrics

推荐阅读 0

[1]	史经业, 李攀. 空天大数据在新型智慧城市建设中的关键技术与应用探索[J]. 大数据, 2022, 8(2): 120-133.
[2]	谭玲, 鄂海红, 匡泽民, 宋美娜, 刘毓, 陈正宇, 谢晓璇, 李峻迪, 范家伟, 王晴川, 康霄阳. 医学知识图谱构建关键技术及研究进展[J]. 大数据, 2021, 7(4): 80-104.
[3]	刘庆霞, 李俊宥, 程龚. 实体摘要系统的解释性评测[J]. 大数据, 2021, 7(3): 15-29.
[4]	申宇铭, 杜剑峰. 时态知识图谱补全的方法及其进展[J]. 大数据, 2021, 7(3): 30-41.
[5]	杜会芳, 王昊奋, 史英慧, 王萌. 知识图谱多跳问答推理研究进展、挑战与展望[J]. 大数据, 2021, 7(3): 60-79.
[6]	胡志磊, 靳小龙, 陈剑赟, 黄冠利. 事件图谱的构建、推理与应用[J]. 大数据, 2021, 7(3): 80-96.
[7]	陈华钧, 张文, 黄志文, 叶橄强, 文博, 张伟. 大规模知识图谱预训练模型及电商应用[J]. 大数据, 2021, 7(3): 97-115.
[8]	陈强, 代仕娅. 基于金融知识图谱的会计欺诈风险识别方法[J]. 大数据, 2021, 7(3): 116-129.
[9]	谢冰, 彭鑫, 尹刚, 李宣东, 魏峻, 孙海龙. 基于大数据的软件智能化开发方法与环境[J]. 大数据, 2021, 7(1): 3-21.
[10]	邹艳珍, 王敏, 谢冰, 林泽琦. 基于大数据的软件项目知识图谱构造及问答方法[J]. 大数据, 2021, 7(1): 22-36.
[11]	陈伟, 叶宏杰, 周家宏, 魏峻. 基于领域知识的Docker镜像自动构建方法[J]. 大数据, 2021, 7(1): 64-75.
[12]	张建, 孟祥鑫, 孙海龙, 王旭, 刘旭东. 数据驱动的软件开发者智能协作技术[J]. 大数据, 2021, 7(1): 76-93.
[13]	陈成, 陈跃国, 刘宸, 吕晓彤, 杜小勇. 意图知识图谱的构建与应用[J]. 大数据, 2020, 6(2): 57-68.
[14]	臧根林, 王亚强, 吴庆蓉, 占春丽, 李熠. 智慧城市知识图谱模型与本体构建方法[J]. 大数据, 2020, 6(2): 96-106.
[15]	李望月, 刘瑾, 陈娜. 大数据技术在乡村画像中的应用研究[J]. 大数据, 2020, 6(1): 99-118.