智能科学与技术学报 ›› 2023, Vol. 5 ›› Issue (3): 313-329.doi: 10.11959/j.issn.2096-6652.202326

• 综述与展望 • 上一篇    下一篇

群视角下的多智能体强化学习方法综述

项凤涛(), 罗俊仁, 谷学强, 苏炯铭, 张万鹏   

  1. 国防科技大学智能科学学院,湖南 长沙 410073
  • 收稿日期:2023-07-21 修回日期:2023-08-22 出版日期:2023-09-15 发布日期:2023-09-26
  • 通讯作者: 项凤涛 E-mail:xiangfengtao@nudt.edu.cn
  • 作者简介:项凤涛(1986- ),男,博士,国防科技大学智能科学学院副教授,主要研究方向为智能辅助决策、不确定性推理、智能控制。
    罗俊仁(1989- ),男,国防科技大学智能科学学院博士生,主要研究方向为不完美信息博弈、多智能体学习等。
    谷学强(1983- ),男,博士,国防科技大学智能科学学院副研究员,主要研究方向为智能规划与决策、智能控制。
    苏炯铭(1984- ),男,博士,国防科技大学智能科学学院副研究员,主要研究方向为可解释人工智能、智能博弈。
    张万鹏(1981- ),男,博士,国防科技大学智能科学学院研究员、博士生导师,主要研究方向为大数据智能、智能演进等。

Survey on multi-agent reinforcement learning methods from the perspective of population

Fengtao XIANG(), Junren LUO, Xueqiang GU, Jiongming SU, Wanpeng ZHANG   

  1. College of Intelligence Science and Technology, National University of Defense Technology, Changsha 410073, China
  • Received:2023-07-21 Revised:2023-08-22 Online:2023-09-15 Published:2023-09-26
  • Contact: Fengtao XIANG E-mail:xiangfengtao@nudt.edu.cn

摘要:

多智能体系统是分布式人工智能领域的前沿研究概念,传统的多智能体强化学习方法主要聚焦群体行为涌现、多智能体合作与协调、智能体间交流与通信、对手建模与预测等主题,但依然面临环境部分可观、对手策略非平稳、决策空间维度高、信用分配难理解等难题,如何设计满足智能体数量规模比较大、适应多类不同应用场景的多智能体强化学习方法是该领域的前沿课题。首先简述了多智能体强化学习的相关研究进展;其次着重从规模可扩展与种群自适应两个视角对多种类、多范式的多智能体学习方法进行了综合概述归纳,系统梳理了集合置换不变性、注意力机制、图与网络理论、平均场理论共四大类规模可扩展学习方法,迁移学习、课程学习、元学习、元博弈共四大类种群自适应强化学习方法,给出典型应用场景;最后从基准平台开发、双层优化架构、对抗策略学习、人机协同价值对齐和自适应博弈决策环共5个方面进行了前沿研究方向展望,该研究可为多模态环境下多智能强化学习的相关前沿重点问题研究提供参考。

关键词: 分布式智能, 平均场理论, 图神经网络, 元学习, 元博弈

Abstract:

Multi-agent systems are a cutting-edge research concept in the field of distributed artificial intelligence. Traditional multi-agent reinforcement learning methods mainly focus on topics such as group behavior emergence, multi-agent cooperation and coordination, communication and communication between agents, opponent modeling and prediction. However, they still face challenges such as observable environment, non-stationary opponent strategies, high dimensionality of decision space, and difficulty in understanding credit allocation. How to design multi-agent reinforcement learning methods that meet the large number and scale of intelligent agents and adapt to multiple different application scenarios is a cutting-edge topic in this field. This article first outlined the relevant research progress of multi-agent reinforcement learning. Secondly, a comprehensive overview and induction of multi-agent learning methods with multiple types and paradigms were conducted from the perspectives of scalability and population adaptation. Four major categories of scalable learning methods were systematically sorted out, including set permutation invariance, attention, graph and network theory, and mean field theory. There were four major categories of population adaptive reinforcement learning methods: transfer learning, course learning, meta learning, and meta game, and typical application scenarios were provided. Finally, the frontier research directions were prospected from five aspects: benchmark platform development, two-layer optimization architecture, adversarial strategy learning, human-machine collaborative value alignment and adaptive game decision-making loop, providing reference for the research on relevant frontier key issues of multi-agent reinforcement learning in multimodal environments.

Key words: distributed intelligence, mean field theory, graph neural network, meta learning, meta game

中图分类号: 

No Suggested Reading articles found!