智能科学与技术学报 ›› 2020, Vol. 2 ›› Issue (4): 327-340.doi: 10.11959/j.issn.2096-6652.202035

• 专刊:深度强化学习 • 上一篇    下一篇

基于强化学习的数据驱动多智能体系统最优一致性综述

李金娜, 程薇燃   

  1. 辽宁石油化工大学信息与控制工程学院,辽宁 抚顺113000
  • 修回日期:2020-12-03 出版日期:2020-12-15 发布日期:2020-12-01
  • 作者简介:李金娜(1977- ),女,博士,辽宁石油化工大学信息与控制工程学院教授,博士生导师,主要研究方向为数据驱动控制、运行优化控制、强化学习、网络控制等。
    程薇燃(1996- ),女,辽宁石油化工大学信息与控制工程学院硕士生,主要研究方向为强化学习、多智能体控制、最优控制、数据驱动控制。
  • 基金资助:
    国家自然科学基金资助项目(61673280);国家自然科学基金资助项目(62073158);辽宁省重点领域联合开放基金资助项目(2019-KF-03-06);辽宁石油化工大学研究基金资助项目(2018XJJ-005)

An overview of optimal consensus for data driven multi-agent system based on reinforcement learning

Jinna LI, Weiran CHENG   

  1. School of Information and Control Engineering, Liaoning Shihua University, Fushun 113000, China
  • Revised:2020-12-03 Online:2020-12-15 Published:2020-12-01
  • Supported by:
    The National Natural Science Foundation of China(61673280);The National Natural Science Foundation of China(62073158);The Open Project of Key Field Alliance of Liaoning Province(2019-KF-03-06);The Project of Liaoning Shihua University(2018XJJ-005)

摘要:

多智能体系统因其在工程、社会科学和自然科学等多学科领域具有潜在、广泛的应用性,在过去的 20 年里引起了研究者的广泛关注。实现多智能体系统的一致性通常需要求解相关矩阵方程离线设计控制协议,这要求系统模型精确已知。然而,实际上多智能体系统具有大规模尺度、非线性耦合性特征,并且环境动态变化,使得系统精确建模非常困难,这给模型依赖的多智能体一致性控制协议设计带来了挑战。强化学习技术因其可以利用沿系统轨迹的测量数据实时学习控制问题的最优解,被广泛用于解决复杂系统最优控制和决策问题。综述了利用强化学习技术,采用数据驱动方式实时在线求解多智能体系统最优一致性控制问题的现有理论和方法,分别从连续和离散、同构和异构、抗干扰的鲁棒性等多个方面介绍了数据驱动的强化学习技术在多智能体系统最优一致性控制问题中的应用。最后讨论了基于数据驱动的多智能体系统最优一致性问题的未来研究方向。

关键词: 强化学习, 多智能体系统, 最优一致性, 数据驱动

Abstract:

Multi-agent system has attracted extensive attention in the past two decades because of its potential applications in engineering, social science and natural science, etc.To achieving the consensus of multi-agent system, it is usually necessary to solve the correlation matrix equation to design the control protocol offline, which requires system model to be known accurately.However, the actual multi-agent system has the characteristics of large-scale, nonlinear coupling, and dynamic change of environment, which makes it very difficult to accurately model the system.This brings challenges to the design of model dependent multi-agent consensus protocol.Reinforcement learning is widely used to solve the optimal control and decision-making problems of complex systems because it can learn the optimal solution of control problems in real time by using the measurement data along the trajectory of the system.The existing theories and methods of online solving the optimal consensus of multi-agent system inreal-time by using reinforcement learning technology were summarized.The application of data-driven reinforcement learning technology in multi-agent system optimal consensus was introduced from the aspects of continuous and discrete, homogenous and heterogeneous, anti-interference robustness and so on.Finally, the future research direction of the optimal consensus problem of multi-agent system based on data-driven technology was discussed.

Key words: reinforcement learning, multi-agent system, optimal consensus, data driven

中图分类号: 

No Suggested Reading articles found!