基于动态自选择参数共享的合作多智能体强化学习算法

doi:10.11959/j.issn.2096-6652.202214

智能科学与技术学报 ›› 2022, Vol. 4 ›› Issue (1): 75-83.doi: 10.11959/j.issn.2096-6652.202214

基于动态自选择参数共享的合作多智能体强化学习算法

王涵, 俞扬, 姜远

计算机软件新技术国家重点实验室（南京大学），江苏南京 210023

修回日期:2022-01-14 出版日期:2022-03-15 发布日期:2022-03-01
作者简介:王涵（1992− ），女，计算机软件新技术国家重点实验室（南京大学）博士生，主要研究方向为机器学习、强化学习
俞扬（1982− ），男，计算机软件新技术国家重点实验室（南京大学）教授，主要研究方向为人工智能、机器学习、强化学习、演化学习
姜远（1976− ），女，博士，计算机软件新技术国家重点实验室（南京大学）教授，主要研究方向为人工智能、机器学习
基金资助:
国家自然科学基金资助项目(61876077)

A cooperative multi-agent reinforcement learning algorithm based on dynamic self-selection parameters sharing

Han WANG, Yang YU, Yuan JIANG

State Key Laboratory for Novel Software Technology at Nanjing University, Nanjing 210023, China

Revised:2022-01-14 Online:2022-03-15 Published:2022-03-01
Supported by:
The National Natural Science Foundation of China(61876077)

摘要/Abstract

摘要：

在多智能体强化学习的研究中，参数共享作为学习过程中一种信息集中的方式，可以有效地缓解不稳定性导致的学习低效性。但是，在实际应用中智能体使用同样的策略往往会带来不利影响。为了解决此类过度共享的问题，提出了一种新的方法来赋予智能体自动识别可能受益于共享参数的智能体的能力，并且可以在学习过程中动态地选择共享参数的对象。具体来说，智能体需要将历史轨迹编码为可表示其潜在意图的隐信息，并通过与其余智能体隐信息的对比选择共享参数的对象。实验表明，提出的方法在多智能体系统中不仅可以提高参数共享的效率，同时保证了策略学习的质量。

关键词: 多智能体系统, 强化学习, 参数共享

Abstract:

In multi-agent reinforcement learning, parameter sharing can effectively alleviate the inefficiency of learning caused by non-stationarity.However, maintaining the same policy forall agents during learning may have detrimental effects.To solve this problem, a new approach was introduced to give agents the ability to automatically identify agents that may benefit from parameter sharing and dynamically share parameters them during learning.Specifically, agents needed to encode empirical trajectories as implicit information that can represent their potential intentions, and selected peers to share parameters by comparing their intentions.Experiments show that the proposed method not only can improve the efficiency of parameter sharing, but also ensure the quality of policy learning in multi-agent system.

Key words: multi-agent system, reinforcement learning, parameter sharing

中图分类号:

TP181

王涵,俞扬,姜远. 基于动态自选择参数共享的合作多智能体强化学习算法[J]. 智能科学与技术学报, 2022, 4(1): 75-83.

Han WANG,Yang YU,Yuan JIANG. A cooperative multi-agent reinforcement learning algorithm based on dynamic self-selection parameters sharing[J]. Chinese Journal of Intelligent Science and Technology, 2022, 4(1): 75-83.

图/表 8

图1

图2

图3

图4

图5

图6

图7

图8

参考文献 22

[1]	刘全, 翟建伟, 章宗长 ,等. 深度强化学习综述[J]. 计算机学报, 2018,41(1): 1-27.
	LIU Q , ZHAI J W , ZHANG Z Z ,et al. A survey on deep reinforcement learning[J]. Chinese Journal of Computers, 2018,41(1): 1-27.
[2]	王飞跃, 曹东璞, 魏庆来 . 强化学习:迈向知行合一的智能机制与算法[J]. 智能科学与技术学报, 2020,2(2): 101-106.
	WANG F Y , CAO D P , WEI Q L . Reinforcement learning:toward action-knowledge merged intelligent mechanisms and algorithms[J]. Chinese Journal of Intelligent Science and Technology, 2020,2(2): 101-106.
[3]	刘朝阳, 穆朝絮, 孙长银 . 深度强化学习算法与应用研究现状综述[J]. 智能科学与技术学报, 2020,2(4): 314-326.
	LIU Z Y , MU C X , SUN C Y . An overview on algorithms and applications of deep reinforcement learning[J]. Chinese Journal of Intelligent Science and Technology, 2020,2(4): 314-326.
[4]	王金予, 魏欣然, 石文磊 ,等. 强化学习在资源优化领域的应用[J]. 大数据, 2021,7(5): 131-149.
	WANG J Y , WEI X R , SHI W L ,et al. Applications of reinforcement learning in the field of resource optimization[J]. Big Data Research, 2021,7(5): 131-149.
[5]	LOWE R , WU Y , TAMAR A ,et al. Multi-agent actor-critic for mixed cooperative-competitive environments[C]// Proceedings of the 31st Annual Conference on Neural Information Processing Systems. Cambridge:MIT Press, 2017.
[6]	CONITZER V , SANDHOLM T . AWESOME:a general multiagent learning algorithm that converges in self-play and learns a best response against stationary opponents[J]. Machine Learning, 2007,67(1/2): 23-43.
[7]	EVERETT R , ROBERTS S . Learning against non-stationary agentswith opponent modelling and deep reinforcement learning[C]// Proceedings of AAAI Spring Symposium Series.[S.l.:s.n.], 2018.
[8]	FOERSTER J N , ASSAEL Y M , DE FREITAS N ,et al. Learning to communicate with deep multi-agent reinforcement learning[C]// Proceedings of Advances in Neural Information Processing Systems. New York:ACM Press, 2016: 2137-2145.
[9]	SUKHBAATAR S , SZLAM A , FERGUS R . Learning multiagentcommunication with back propagation[C]// Proceedings of Advances in Neural Information Processing Systems. New York:ACM Press, 2016: 2244-2252.
[10]	KIM D , MOON S , HOSTALLERO D ,et al. Learning to schedule communication in multi-agent reinforcement learning[C]// Proceedings of the 7th International Conference on Learning Representations.[S.l.:s.n.], 2019.
[11]	SUNEHAG P , LEVER G , GRUSLYS A ,et al. Value-decomposition networks for cooperative multi-agent learning[J]. arXiv preprint, 2017,arXiv:1706.05296.
[12]	RASHID T , SAMVELYAN M , WITT C S D ,et al. QMIX:monotonic value function factorisation for deep multi-agent reinforcement learning[C]// Proceedings of the 35th International Conference on Machine Learning. New York:JMLR, 2018: 4292-4301.
[13]	GUPTA J K , EGOROV M , KOCHENDERFER M J . Cooperative multi-agent control using deep reinforcement learning[C]// Proceedings of the 18th International Conference on Autonomous Agents and Multiagent Systems. New York:ACM Press, 2017: 66-83.
[14]	BAI H Y , CAI S J , YE N ,et al. Intention-aware online POMDP planning for autonomous driving in a crowd[C]// Proceedings of 2015 IEEE International Conference on Robotics and Automation. Piscataway:IEEE Press, 2015: 454-460.
[15]	SADIGH D , SASTRY S S , SESHIA S A ,et al. Information gathering actions over human internal state[C]// Proceedings of 2016 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS). Piscataway:IEEE Press, 2016: 66-73.
[16]	XIE A N , LOSEY D P , TOLSMA R ,et al. Learning latent representations to influence multi-agent interaction[C]// Proceedings of the 4th Conference on Robot Learning. New York:JMLR, 2020: 575-588.
[17]	LITTMAN M L . Markov games as a framework for multi-agent reinforcement learning[M]// Machine learning proceedings 1994. Amsterdam: Elsevier, 1994: 157-163.
[18]	TERRY J K , GRAMMEL N , HARI A ,et al. Revisiting parameter sharing in multi-agent deep reinforcement learning[J]. arXiv preprint, 2020,arXiv:2005.13625.
[19]	CHRISTIANOS F , PAPOUDAKIS G , RAHMAN A ,et al. Scaling multi-agent reinforcement learning with selective parameter sharing[C]// Proceedings of the 38th International Conference on Machine Learning. New York:JMLR, 2021: 1989-1998.
[20]	QI S Y , ZHU S C . Intent-aware multi-agent reinforcement learning[C]// Proceedings of 2018 IEEE International Conference on Robotics and Automation. Piscataway:IEEE Press, 2018: 7533-7540.
[21]	KIM W J , PARK J G , SUNG T C . Communication in multi-agent reinforcement learning:intention sharing[C]// Proceedings of the 9th International Conference on Learning Representations.[S.l.:s.n.], 2021.
[22]	SAMVELYAN M , RASHID T , WITT C S ,et al. The StarCraft multi-agent challenge[C]// Proceedings of the 18th International Conference on Autonomous Agents and Multi-Agent Systems. New York:ACM Press, 2019: 2186-2188.

基于动态自选择参数共享的合作多智能体强化学习算法

A cooperative multi-agent reinforcement learning algorithm based on dynamic self-selection parameters sharing

在线阅读

PDF下载

可视化

摘要/Abstract

引用本文

使用本文

图/表 8

参考文献 22

相关文章 15

Metrics

推荐阅读 0

[1]	郁洲, 毕敬, 苑海涛. 基于改进DQN算法的复杂海战场路径规划方法[J]. 智能科学与技术学报, 2022, 4(3): 418-425.
[2]	马帅, 傅启明, 陈建平, 冯帆, 陆悠, 李铮伟, 裘舒年. 基于双池DQN的HVAC无模型优化控制方法[J]. 智能科学与技术学报, 2022, 4(3): 426-444.
[3]	孙宇祥, 彭益辉, 李斌, 周佳炜, 张鑫磊, 周献中. 智能博弈综述：游戏AI对作战推演的启示[J]. 智能科学与技术学报, 2022, 4(2): 157-173.
[4]	徐德, 秦方博. 机器人自动轴孔装配研究进展[J]. 智能科学与技术学报, 2022, 4(2): 200-211.
[5]	刘家成, 张向文. 基于TD3的电动汽车复合电源能量管理策略研究[J]. 智能科学与技术学报, 2022, 4(2): 277-287.
[6]	冯埔, 吴文峻, 罗杰, 于鑫, 田雍恺. 基于群体熵的机器人群体智能汇聚度量[J]. 智能科学与技术学报, 2022, 4(1): 65-74.
[7]	夏丽娜, 李擎, 宋睿卓, 王子涵, 许镇. 未知异构多智能体系统无模型自适应动态规划同步控制[J]. 智能科学与技术学报, 2021, 3(4): 444-448.
[8]	胡志强. 大数据智能指挥控制内在机理框架模型研究[J]. 智能科学与技术学报, 2021, 3(1): 101-109.
[9]	刘朝阳, 穆朝絮, 孙长银. 深度强化学习算法与应用研究现状综述[J]. 智能科学与技术学报, 2020, 2(4): 314-326.
[10]	李金娜, 程薇燃. 基于强化学习的数据驱动多智能体系统最优一致性综述[J]. 智能科学与技术学报, 2020, 2(4): 327-340.
[11]	贾庆山, 唐静娴, 吴俊杰, 胡潇, 林依挺, 夏恒. 面向数据中心绿色可靠运行的强化学习方法[J]. 智能科学与技术学报, 2020, 2(4): 341-347.
[12]	李涛, 魏庆来. 基于深度强化学习的智能暖气温度控制系统[J]. 智能科学与技术学报, 2020, 2(4): 348-353.
[13]	王日中, 李慧平, 崔迪, 徐德民. 基于深度强化学习算法的自主式水下航行器深度控制[J]. 智能科学与技术学报, 2020, 2(4): 354-360.
[14]	傅汇乔, 唐开强, 邓归洲, 王鑫鹏, 陈春林. 基于深度强化学习的六足机器人运动规划[J]. 智能科学与技术学报, 2020, 2(4): 361-371.
[15]	刘莹莹, 王占山. 异构多智能体系统的输出同步：一个基于数据的强化学习方法[J]. 智能科学与技术学报, 2020, 2(4): 394-400.