知识增强策略引导的交互式强化推荐系统

doi:10.11959/j.issn.2096-0271.2022033

Abstract

Abstract:

The recommendation system is an important means to solve the problem of information overload in social media.To solve the problem that traditional recommendation systems cannot optimize the longterm user experience, researchers have proposed the interactive recommendation system and tried to use deep reinforcement learning to optimize the strategy of recommendation.However, the reinforcement recommendation algorithm faces problems such as sparse feedback, learning from zero which damages the user experience, and large item space.To solve the above problems, an improved interactive reinforcement recommendation model KGP-DQN was proposed.The model constructed a behavioral knowledge graph representation module, which combines user historical behavior and knowledge graph to solve the problem of sparse feedback.The model constructed a strategy initialization module to provide an initialization strategy for the reinforcement recommendation system based on user historical behaviors to solve the problem of learning from zero.The model constructed the candidate select module which creates candidates by dynamic clustering based on the item representation on the behavioral knowledge graph to solve the problem of large action space.The experiments were conducted on three real-world datasets.The experimental results show that the KGP-DQN method can quickly and effectively train the reinforcement recommendation system and its recommendation accuracy on three datasets is more than 80%.

Key words: interactive recommendation system, deep reinforcement learning, knowledge graph, policy initialization, candidate select

CLC Number:

TP391

Yuqi ZHANG, Xiaowen HUANG, Jitao SANG. Knowledge-enhanced policy-guided interactive reinforcement recommendation system[J]. Big Data Research, 2022, 8(5): 88-105.

Figures/Tables 13

References 17

[1]	WANG H Z , WU Q Y , WANG H N . Factorization bandits for interactive recommendation[C]// Proceedings of the 31st AAAI Conference on Artificial Intelligence. Palo Alto:AAAI Press, 2017: 2695-2702.
[2]	MAHMOOD T , RICCI F . Learning and adaptivity in interactive recommender systems[C]// Proceedings of the 9th International Conference on Electronic Commerce. New York:ACM Press, 2007: 75-84.
[3]	ZHAO X Y , ZHANG L , DING Z Y ,et al. Recommendations with negative feedback via pairwise deep reinforcement learning[C]// Proceedings of the 24th ACM SIGKDD International Conference on Knowledge Discovery ＆ Data Mining. New York:ACM Press, 2018: 1040-1048.
[4]	ZHOU S J , DAI X Y , CHEN H K ,et al. Interactive recommender system via knowledge graph-enhanced reinforcement learning[C]// Proceedings of the 43rd International ACM SIGIR Conference on Research and Development in Information Retrieval. New York:ACM Press, 2020: 179-188.
[5]	WANG P F , FAN Y , XIA L ,et al. KERL:a knowledge-guided reinforcement learning model for sequential recommendation[C]// Proceedings of the 43rd International ACM SIGIR Conference on Research and Development in Information Retrieval. New York:ACM Press, 2020: 209-218.
[6]	GOLDBERG D , NICHOLS D , OKI B M ,et al. Using collaborative filtering to weave an information tapestry[J]. Communications of the ACM, 1992,35(12): 61-70.
[7]	RENDLE S , . Factorization machines[C]// Proceedings of 2010 IEEE International Conference on Data Mining. Piscataway:IEEE Press, 2010: 995-1000.
[8]	HIDASI B , KARATZOGLOU A , BALTRUNAS L ,et al. Session-based recommendations with recurrent neural networks[J]. arXiv preprint,2015,arXiv:1511.06939.
[9]	ZHOU G R , ZHU X Q , SONG C R ,et al. Deep interest network for click-through rate prediction[C]// Proceedings of the 24th ACM SIGKDD International Conference on Knowledge Discovery ＆ Data Mining. New York:ACM Press, 2018: 1059-1068.
[10]	CHEN H K , DAI X Y , CAI H ,et al. Large-scale interactive recommendation with tree-structured policy gradient[C]// Proceedings of the 33rd AAAI Conference on Artificial Intelligence. Palo Alto:AAAI Press, 2019: 3312-3320.
[11]	ZHENG G J , ZHANG F Z , ZHENG Z H ,et al. DRN:a deep reinforcement learning framework for news recommendation[C]// Proceedings of the 2018 World Wide Web Conference.[S.l.:s.n.], 2018: 167-176.
[12]	ZOU L X , XIA L , DING Z Y ,et al. Reinforcement learning to optimize longterm user engagement in recommender systems[C]// Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery ＆ Data Mining. New York:ACM Press, 2019: 2810-2818.
[13]	DULAC-ARNOLD G , EVANS R , SUNEHAG P ,et al. Reinforcement learning in large discrete action spaces[J]. arXiv preprint,2015,arXiv:1512.07679.
[14]	XIAN Y K , FU Z H , MUTHUKRISHNAN S ,et al. Reinforcement knowledge graph reasoning for explainable recommendation[C]// Proceedings of the 42nd International ACM SIGIR Conference on Research and Development in Information Retrieval. New York:ACM Press, 2019: 285-294.
[15]	HASSELT H V , GUEZ A , SILVER D . Deep reinforcement learning with double Q-learning[C]// Proceedings of the 30th AAAI Conference on Artificial Intelligence. Palo Alto:AAAI Press, 2016.
[16]	KOREN Y , BELL R , VOLINSKY C . Matrix factorization techniques for recommender systems[J]. Computer, 2009,42(8): 30-37.
[17]	DAVIES D L , BOULDIN D W . A cluster separation measure[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 1979,PAMI-1(2): 224-227.

Metrics

Recommended 0

No Suggested Reading articles found!

符号	描述
	交互式推荐系统中的物品表示
	交互式推荐系统中的物品表示
	G 为知识图谱，E 为点信息，R为边信息
	G′为行为知识图谱，E′为点信息，R′为边信息
	O_i 为用户的历史交互行为，i为用户访问过的物品
	t时刻用户的表示
r_t	t时刻用户给推荐系统的奖励
I_t	t时刻推荐系统的候选集
θ_s	状态表示网络的参数
θ_q	推荐评价Q网络的参数
	目标Q网络的参数
D	记忆库

对比项		MovieLens-1m	Last.fm	Yelp
交互数据	用户数/个	5 417	1 801	27 675
	物品数/个	3 650	7 432	70 311
	属性数/个	15	33	590
	数据数/条	986 495	76 693	1 368 606
知识图谱	实体数/个	99 060	9 266	98 576
	三元组数/个	207 939	138 217	2 533 827

对比项	MovieLens-1m			Last.fm			Yelp
对比项	奖励	准确率	召回率	奖励	准确率	召回率	奖励	准确率	召回率
协同过滤	-0.07	51%	1.31%	0.05	62%	0.9%	-0.15	49%	0.2%
GRU4REC	0.15	68%	1.74%	0.20	72%	1.1%	0.22	61%	0.3%
DQNR	0.26	75%	1.92%	0.36	78%	1.2%	0.26	63%	0.3%
KGQN	0.37	77%	1.98%	0.45	84%	1.2%	0.40	82%	0.4%
KGP-DQN	0.47	82%	2.11%	0.53	87%	1.3%	0.47	91%	0.4%

对比项	MovieLens-1m		Last.fm		Yelp
对比项	奖励	准确率	奖励	准确率	奖励	准确率
KGP-DQN-kg	0.26 (-45%)	75% (-9%)	0.36 (-32%)	78% (-10%)	0.26 (-44%)	63% (-31%)
KGP-DQN-pg	0.37 (-21%)	78% (-5%)	0.45 (-15%)	84% (-3%)	0.39 (-17%)	85% (-7%)
KGP-DQN-cs	0.46 (-2%)	81% (-1%)	0.50 (-6%)	85% (-2%)	0.47 (-0%)	89% (-2%)
KGP-DQN	0.47	82%	0.53	87%	0.47	91%

推荐次数/次	MovieLens-1m	Last.fm	Yelp
0	2.71	3.74	4.97
100	2.43	3.62	4.91
200	2.26	3.56	4.60
300	2.15	3.54	4.05

Knowledge-enhanced policy-guided interactive reinforcement recommendation system

RichHTML

PDF下载

Knowledge

Abstract

Cite this article

share this article

Figures/Tables 13

References 17

Related Articles 15

Metrics

Recommended 0

[1]	Jingye SHI, Pan LI. Key technologies and application exploration of aerospace big data in the construction of new smart city [J]. Big Data Research, 2022, 8(2): 120-133.
[2]	Ling TAN, Haihong E, Zemin KUANG, Meina SONG, Yu LIU, Zhengyu CHEN, Xiaoxuan XIE, Jundi LI, Jiawei FAN, Qingchuan WANG, Xiaoyang KANG. Key technologies and research progress of medical knowledge graph construction [J]. Big Data Research, 2021, 7(4): 80-104.
[3]	Qingxia LIU, Junyou LI, Gong CHENG. An interpretive evaluation of entity summarization system [J]. Big Data Research, 2021, 7(3): 15-29.
[4]	Yuming SHEN, Jianfeng DU. Temporal knowledge graph completion:methods and progress [J]. Big Data Research, 2021, 7(3): 30-41.
[5]	Huifang DU, Haofen WANG, Yinghui SHI, Meng WANG. Progress, challenges and research trends of reasoning in multi-hop knowledge graph based question answering [J]. Big Data Research, 2021, 7(3): 60-79.
[6]	Zhilei HU, Xiaolong JIN, Jianyun CHEN, Guanli HUANG. Construction, reasoning and applications of event graphs [J]. Big Data Research, 2021, 7(3): 80-96.
[7]	Huajun CHEN, Wen ZHANG, Chi-Man WONG, Ganqiang YE, Bo WEN, Wei ZHANG. Large scale pre-trained knowledge graph model and e-commerce application [J]. Big Data Research, 2021, 7(3): 97-115.
[8]	Qiang CHEN, Shiya DAI. Recognition method of accounting fraud risk based on financial knowledge graph [J]. Big Data Research, 2021, 7(3): 116-129.
[9]	Bing XIE, Xin PENG, Gang YIN, Xuandong LI, Jun WEI, Hailong SUN. Big data based intelligent software development methodology and environment [J]. Big Data Research, 2021, 7(1): 3-21.
[10]	Yanzhen ZOU, Min WANG, Bing XIE, Zeqi LIN. Software knowledge graph construction and Q＆amp;A technology based on big data [J]. Big Data Research, 2021, 7(1): 22-36.
[11]	Wei CHEN, Hongjie YE, Jiahong ZHOU, Jun WEI. An approach to automatically building Docker images by using domain knowledge [J]. Big Data Research, 2021, 7(1): 64-75.
[12]	Jian ZHANG, Xiangxin MENG, Hailong SUN, Xu WANG, Xudong LIU. Data driven intelligent collaboration of software developers [J]. Big Data Research, 2021, 7(1): 76-93.
[13]	Cheng CHEN, Yueguo CHEN, Chen LIU, Xiaotong LYU, Xiaoyong DU. Constructing and analyzing intention knowledge graphs [J]. Big Data Research, 2020, 6(2): 57-68.
[14]	Genlin ZANG, Yaqiang WANG, Qingrong WU, Chunli ZHAN, Yi LI. Model and construction method of the ontology of knowledge graph of smart city [J]. Big Data Research, 2020, 6(2): 96-106.
[15]	Ying LIANG, Wei ZHANG, Zhidong YU, Hongzhou SHI. Applications of academic big data in the process of science and technology management [J]. Big Data Research, 2019, 5(5): 3-15.