基于鸽群的鲁棒强化学习算法

doi:10.11959/j.issn.2096-109x.2022064

Abstract

Abstract:

Reinforcement learning(RL) is an artificial intelligence algorithm with the advantages of clear calculation logic and easy expansion of the model.Through interacting with the environment and maximizing value functions on the premise of obtaining little or no prior information, RL can optimize the performance of strategies and effectively reduce the complexity caused by physical models .The RL algorithm based on strategy gradient has been successfully applied in many fields such as intelligent image recognition, robot control and path planning for automatic driving.However, the highly sampling-dependent characteristics of RL determine that the training process needs a large number of samples to converge, and the accuracy of decision making is easily affected by slight interference that does not match with the simulation environment.Especially when RL is applied to the control field, it is difficult to prove the stability of the algorithm because the convergence of the algorithm cannot be guaranteed.Considering that swarm intelligence algorithm can solve complex problems through group cooperation and has the characteristics of self-organization and strong stability, it is an effective way to be used for improving the stability of RL model.The pigeon-inspired optimization algorithm in swarm intelligence was combined to improve RL based on strategy gradient.A RL algorithm based on pigeon-inspired optimization was proposed to solve the strategy gradient in order to maximize long-term future rewards.Adaptive function of pigeon-inspired optimization algorithm and RL were combined to estimate the advantages and disadvantages of strategies, avoid solving into an infinite loop, and improve the stability of the algorithm.A nonlinear two-wheel inverted pendulum robot control system was selected for simulation verification.The simulation results show that the RL algorithm based on pigeon-inspired optimization can improve the robustness of the system, reduce the computational cost, and reduce the algorithm’s dependence on the sample database.

Key words: pigeon-inspired optimization algorithm, strengthen learning, policy gradient, robustness

CLC Number:

TP393

Mingying ZHANG, Bing HUA, Yuguang ZHANG, Haidong LI, Mohong ZHENG. Robust reinforcement learning algorithm based on pigeon-inspired optimization[J]. Chinese Journal of Network and Information Security, 2022, 8(5): 66-74.

Figures/Tables 11

References 20

[1]	PETERS J , SCHAAL S . Policy gradient methods for robotics[C]// 2006 IEEE/RSJ International Conference on Intelligent Robots and Systems. 2006: 2219-2225.
[2]	BAUM Y , AMICO M , HOWELL S ,et al. Experimental deep reinforcement learning for error-robust gate-set design on a superconducting quantum computer[J]. PRX Quantum, 2021,2(4): 040324.
[3]	HUA J , ZENG L , LI G ,et al. Learning for a robot:deep reinforcement learning,imitation learning,transfer learning[J]. Sensors, 2021,21(4): 1278.
[4]	SIVAK V V , EICKBUSCH A , LIU H ,et al. Model-free quantum control with reinforcement learning[J]. Physical Review X, 2022,12(1): 011059.
[5]	AGARWAL N , HAZAN E , MAJUMDAR A ,et al. A regret minimization approach to iterative learning control[C]// International Conference on Machine Learning (PMLR). 2021: 100-109.
[6]	KRIZHEVSKY A , SUTSKEVER I , HINTON G E . Imagenet classification with deep convolutional neural networks[J]. Advances in Neural Information Processing Systems. 2012: 1097-1105.
[7]	YARATS D , FERGUS R , LAZARIC A ,et al. Reinforcement learning with prototypical representations[C]// International Conference on Machine Learning (PMLR). 2021: 11920-11931.
[8]	DAHL G E , YU D , DENG L ,et al. Context-dependent pre-trained deep neural networks for large-vocabulary speech recognition[J]. IEEE Transactions on Audio,Speech,and Language Processing, 2011: 30-42.
[9]	HAOXIANG W , SMYS S . Overview of configuring adaptive activation functions for deep neural networks—a comparative study[J]. Journal of Ubiquitous Computing and Communication Technologies (UCCT), 2021,3(1): 10-22.
[10]	MISHRA A , LATORRE J A , Pool J ,et al. Accelerating sparse deep neural networks[J]. arXiv preprint arXiv:2104.08378, 2021.
[11]	SILVER D , HUANG A , MADDISON C J ,et al. Mastering the game of go with deep neural networks and tree search[J]. Nature, 2016,529:484.
[12]	VINYALS O , BABUSCHKIN I , CZARNECKI W M ,et al. Grandmaster level in StarCraft II using multi-agent reinforcement learning[J]. Nature, 2019,575: 350-354.
[13]	HEESS N , WAYNE G , SILVER D ,et al. Learning continuous control policies by stochastic value gradients[C]// Advances in Neural Information Processing Systems. 2015:28.
[14]	CHEN Z , CHEN B , XIE S ,et al. Efficiently training on-policy actor-critic networks in robotic deep reinforcement learning with demonstration-like sampled exploration[C]// 2021 3rd International Symposium on Robotics ＆ Intelligent Manufacturing Technology (ISRIMT). 2021: 292-298.
[15]	WANG C , LING Y . Actor-critic tracking with precise scale estimation and advantage function[J]. Journal of Physics Conference Series, 2021,1827(1): 012064.
[16]	ZHANG S , DUAN H . Gaussian pigeon-inspired optimization approach to orbital spacecraft formation reconfiguration[J]. Chinese Journal of Aeronautics, 2015,28(1): 200-205.
[17]	ZHANG B , DUAN H . Three-dimensional path planning for uninhabited combat aerial vehicle based on predator-prey pigeon-inspired optimization in dynamic environment[J]. IEEE/ACM Transactions on Computational Biology ＆ Bioinformatics, 2017,14(1): 97-107.
[18]	周雨鹏 . 基于鸽群算法的函数优化问题求解[D]. 长春:东北师范大学, 2016.
	ZHOU Y P . Function optimization problem solving based on pigeon swarm algorithm[D]. Changchun:Northeast Normal University, 2016.
[19]	顾清华, 孟倩倩 . 优化复杂函数的粒子群-鸽群混合优化算法[J]. 计算机工程与应用, 2019,55(22): 46-52.
	GU Q H , MENG Q Q . Hybrid particle swarm optimization and pigeon—inspired optimization algorithm for solving complex functions[J]. Computer Engineering and Applications, 2019,55(22): 46-52.
[20]	胡耀龙, 冯强, 海星朔 ,等. 基于自适应学习策略的改进鸽群优化算法[J]. 北京航空航天大学学报, 2020,46(12): 2348-2356.
	HU Y L , FENG Q , HAI X S ,et al. Improved pigeon-inspired optimization algorithm based on adaptive learning strategy[J]. Journal of Beijing University of Aeronautics and Astronautics, 2020,46(12): 2348-2356.

Metrics

Recommended 0

No Suggested Reading articles found!

算例	控制算法	数据库
1	基于鸽群的鲁棒强化学习算法	与任务匹配
2	基于鸽群的鲁棒强化学习算法	与任务不匹配
3	基于策略梯度的强化学习算法	与任务匹配
4	基于策略梯度的强化学习算法	与任务不匹配

物理量	参数	物理量	参数
m_B	2.5 kg	m_W	0.636 kg
l	0.026 m	J	5.175×10^-4kg.m²

算例	控制算法	数据库是否匹配	控制结果	完成1次训练的时间/s	是否存在无法判断策略梯度的情况
1	基于鸽群的鲁棒强化学习算法	是	完成控制任务	3.4	否
2	基于鸽群的鲁棒强化学习算法	否	完成控制任务	3.4	否
3	基于策略梯度的强化学习算法	是	完成控制任务	4.1	是
4	基于策略梯度的强化学习算法	否	未完成	4.1	是

Robust reinforcement learning algorithm based on pigeon-inspired optimization

RichHTML

PDF下载

Knowledge

Abstract

Cite this article

share this article

Figures/Tables 11

References 20

Related Articles 10

Metrics

Recommended 0

[1]	Dian LIN, Li PAN, Ping YI. Research on the robustness of convolutional neural networks in image recognition [J]. Chinese Journal of Network and Information Security, 2022, 8(3): 111-122.
[2]	Haoran SHI, Lixin JI, Shuxin LIU, Gengrun WANG. Abnormal link detection algorithm based on semi-local structure [J]. Chinese Journal of Network and Information Security, 2022, 8(1): 63-72.
[3]	Pengcheng WANG, Haibin ZHENG, Jianfei ZOU, Ling PANG, Hu LI, Jinyin CHEN. Robustness evaluation of commercial liveness detection platform [J]. Chinese Journal of Network and Information Security, 2022, 8(1): 180-189.
[4]	Zhenglong WANG, Baowen ZHANG. Survey of generative adversarial network [J]. Chinese Journal of Network and Information Security, 2021, 7(4): 68-85.
[5]	Jinyin CHEN, Dunjie ZHANG, Guohan HUANG, Xiang LIN, Liang BAO. Adversarial attack and defense on graph neural networks: a survey [J]. Chinese Journal of Network and Information Security, 2021, 7(3): 1-28.
[6]	Qi WU,Hongchang CHEN. Low failure recovery cost controller placement strategy in software defined networks [J]. Chinese Journal of Network and Information Security, 2020, 6(6): 97-104.
[7]	Kang HE,Yuefei ZHU,Long LIU,Bin LU,Bin LIU. Improve the robustness of algorithm under adversarial environment by moving target defense [J]. Chinese Journal of Network and Information Security, 2020, 6(4): 67-76.
[8]	Hao ZHAO,Wei LIN,Shengli LIU. Method for robust enhancement of P2P network [J]. Chinese Journal of Network and Information Security, 2019, 5(2): 88-94.
[9]	Guang SUN,Xiao-ping FAN,Wang-dong JIANG,Hang-jun ZHOU,Sheng-zong LIU,Chun-hong GONG,Jing ZHU. Software watermarking scheme with cloud computing constraints [J]. Chinese Journal of Network and Information Security, 2016, 2(9): 12-21.
[10]	Cui-ling JIANG,Shuai HUANG,An-wen WU,Wen-xin YU,Zhou-mao KANG,Yong-qiang LI. Robust digital watermark method based on PDF417 two-dimension code [J]. Chinese Journal of Network and Information Security, 2016, 2(9): 72-81.