基于动态动作覆盖的深度强化学习新闻推荐

doi:10.11959/j.issn.2096-0271.2023069

摘要/Abstract

摘要：

新闻推荐系统对新媒体新闻传播有着重要作用。提出了一种以深度强化学习为基础的推荐系统，旨在结合神经网络的表征能力和强化学习的策略选择能力来提升新闻推荐效果。使用动态动作掩码加强对用户短期兴趣的判断能力，使用优化缓存机制提升经验缓存的使用效率，通过区域遮蔽性质的奖励设计加快模型训练，从而提高推荐系统在新闻推荐领域的表现。实验表明，所提模型在新闻数据集上的推荐准确率与主流的神经网络推荐方法相当，且在排序性能上优于当前先进的推荐算法。

关键词: 新闻推荐, 强化学习, 动态掩码, 优势缓存, 内在奖励

Abstract:

News recommendation system plays an important role in news dissemination of new media.This paper proposed a recommendation system based on deep reinforcement learning, which aimed to combine the representation ability of neural network and the strategy selection ability of reinforcement learning to improve the effect of news recommendation.This paper used dynamic action masks to enhance the ability of judging the short-term interests of users, used the optimization cache mechanism to improve the efficiency of experience cache use, and accelerated model training through the reward design of regional masking nature to improve the performance of the recommendation system in the field of news recommendation.Experimental results show that the accuracy of the proposed model in news data sets is comparable to the current mainstream neural network recommendation methods,and its ranking performance is better than others.

Key words: news recommendation, reinforcement learning, dynamic mask, advantage cache, internal reward

中图分类号:

TP311.5

董相宏, 安俊秀. 基于动态动作覆盖的深度强化学习新闻推荐[J]. 大数据, 2024, 10(3): 109-118.

Xianghong DONG, Junxiu AN. Deep reinforcement learning news recommendation based on dynamic action coverage[J]. Big Data Research, 2024, 10(3): 109-118.

图/表 4

参考文献 26

[6]	IJNTEMA W , GOOSSEN F , FRASINCAR F ,et al. Ontology-based news recommendation[C]// Proceedings of the 2010 EDBT/ICDT Workshops. New York:ACM, 2010: 1-6.
[7]	OKURA S , TAGAMI Y , ONO S ,et al. Embedding-based news recommendation for millions of users[C]// Proceedings of the 23rd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. New York:ACM, 2017: 1933-1942.
[8]	KARVELIS P , GAVRILIS D , GEORGOULAS G ,et al. Topic recommendation using Doc2Vec[C]// Proceedings of 2018 International Joint Conference on Neural Networks (IJCNN). Piscataway:IEEE Press, 2018: 1-6.
[9]	CASELLES-DUPRé H , LESAINT F , ROYO-LETELIER J . Word2Vec applied to recommendation:hyperparameters matter[C]// Proceedings of the 12th ACM Conference on Recommender Systems. New York:ACM, 2018: 352-356.
[10]	ZHANG J D , CHOW C Y , LI Y H . iGeoRec:a personalized and efficient geographical location recommendation framework[J]. IEEE Transactions on Services Computing, 2015,8(5): 701-714.
[11]	KARATZOGLOU A , HIDASI B . Deep learning for recommender systems[C]// Proceedings of the Eleventh ACM Conference on Recommender Systems. New York:ACM, 2017: 396-397.
[12]	DEVLIN S M , KUDENKO D . Dynamic potential-based reward shaping[C]// Proceedings of the 11th International Conference on Autonomous Agents and Multiagent Systems. Richland:IFAAMAS, 2012: 433-440.
[13]	LI L H , CHU W , LANGFORD J ,et al. A contextual-bandit approach to personalized news article recommendation[C]// Proceedings of the 19th international conference on World wide web. New York:ACM, 2010: 661-670.
[14]	YUE Y S , JOACHIMS T . Interactively optimizing information retrieval systems as a dueling bandits problem[C]// Proceedings of the 26th Annual International Conference on Machine Learning. New York:ACM, 2009: 1201-1208.
[15]	XIAOCONG C , LINA Y ,et al Localitysensitive state-guided experience replay optimization for sparse rewards in online recommendation[C]// Proceedings of the 45th international ACM SIGIR conference on research and development in information retrieval. New York:ACM, 2022: 1316-1325.
[16]	刘全, 翟建伟, 章宗长 ,等. 深度强化学习综述[J]. 计算机学报, 2018,41(1): 1-27.
	LIU Q , ZHAI J W , ZHANG Z Z ,et al. A survey on deep reinforcement learning[J]. Chinese Journal of Computers, 2018,41(1): 1-27.
[17]	ZHANG Y Y , SU X Y , LIU Y . A novel movie recommendation system based on deep reinforcement learning with prioritized experience replay[C]// Proceedings of 2019 IEEE 19th International Conference on Communication Technology (ICCT). Piscataway:IEEE Press, 2020: 1496-1500.
[18]	LI Y Q , CHEN W Z , YAN H F . Learning graph-based embedding for time-aware product recommendation[C]// Proceedings of the 2017 ACM on Conference on Information and Knowledge Management. New York:ACM, 2017: 2163-2166.
[19]	LIU Q , ZENG Y F , MOKHOSI R ,et al. STAMP:short-term attention/memory priority model for sessionbased recommendation[C]// Proceedings of the 24th ACM SIGKDD International Conference on Knowledge Discovery ＆ Data Mining. New York:ACM, 2018: 1831-1839.
[20]	蔡丽娇, 秦进, 陈双 . 远离旧区域和避免回路的强化探索方法[J]. 计算机工程, 2023,49(7): 118-124,134.
	CAI L J , QIN J , CHEN S . Reinforcement exploration method to keep away from old areas and avoid loops[J]. Computer Engineering, 2023,49(7): 118-124,134.
[21]	ZHAO X Y , ZHANG L , DING Z Y ,et al. Recommendations with negative feedback via pairwise deep reinforcement learning[C]// Proceedings of the 24th ACM SIGKDD International Conference on Knowledge Discovery ＆ Data Mining. New York:ACM, 2018: 1040-1048.
[22]	GONG S , ZHU K Q . Positive,negative and neutral:modeling implicit feedback in session-based news recommendation[EB]. arXiv prprint,2022,arXiv:2205.06058.
[23]	刘树栋, 张可, 陈旭 . 基于多维度兴趣注意力和用户长短期偏好的新闻推荐[J]. 中文信息学报, 2022,36(9): 102-111.
	LIU S D , ZHANG K , CHEN X . Multidimensional interest-attention-based news recommendation with long and short-term user preferences[J]. Journal of Chinese Information Processing, 2022,36(9): 102-111.
[24]	陈希亮, 曹雷, 李晨溪 ,等. 基于重抽样优选缓存经验回放机制的深度强化学习方法[J]. 控制与决策, 2018,33(4): 600-606.
[1]	LIN C , XIE R Q , GUAN X J ,et al. Personalized news recommendation via implicit social experts[J]. Information Sciences, 2014,254: 1-18.
[2]	ZHENG G , ZHANG F , ZHENG Z ,et al. DRN:a deep reinforcement learning framework for news recommendation[C]// Proceedings of the 2018 World Wide Web Conference. Republic and Canton of Geneva:IW3C2, 2018: 167-176.
[3]	HIDASI B , KARATZOGLOU A , BALTRUNAS L ,et al. Session-based recommendations with recurrent neural networks[EB]. arXiv preprint,2015,arXiv:1511.06939.
[4]	LIN G Y , GAO C , LI Y F ,et al. Dual contrastive network for sequential recommendation[C]// Proceedings of the 45th International ACM SIGIR Conference on Research and Development in Information Retrieval. New York:ACM, 2022: 2686-2691.
[5]	ZHAO Q H . RESETBERT4Rec:a pretraining model integrating time and user historical behavior for sequential recommendation[C]// Proceedings of the 45th International ACM SIGIR Conference on Research and Development in Information Retrieval. New York:ACM, 2022: 1812-1816.
[24]	CHEN X L , CAO L , LI C X ,et al. Deep reinforcement learning via good choice resampling experience replay memory[J]. Control and Decision, 2018,33(4): 600-606.
[25]	KOREN Y . Factorization meets the neighborhood:a multifaceted collaborative filtering model[C]// Proceedings of the 14th ACM SIGKDD international conference on Knowledge discovery and data mining. New York:ACM, 2008: 426-434.
[26]	HE X N , LIAO L Z , ZHANG H W ,et al. Neural collaborative filtering[C]// Proceedings of the 26th International Conference on World Wide Web. Republic and Canton of Geneva:International World Wide Web Conferences Steering Committee, 2017: 173-182.

方法	HR@10	MRR@10	NDCG@10
SVD++	0.38105	0.19401	0.11597
NCF	0.59171	0.38729	0.39812
GRU4Rec	0.73882	0.48101	0.41101
SLi-Rec	0.68491	0.42635	0.47539
DQN	0.31023	0.43916	0.44379
DAMDRL	0.60177	0.49348	0.50181

方法	HR@10	MRR@10	NDCG@10
DAMDRL-1	0.59021	0.47834	0.41032
DAMDRL-2	0.58911	0.48921	0.37079
DAMDRL	0.60177	0.49348	0.50181