基于深度强化学习的六足机器人运动规划

doi:10.11959/j.issn.2096-6652.202039

智能科学与技术学报 ›› 2020, Vol. 2 ›› Issue (4): 361-371.doi: 10.11959/j.issn.2096-6652.202039

基于深度强化学习的六足机器人运动规划

傅汇乔¹, 唐开强¹, 邓归洲², 王鑫鹏², 陈春林¹

¹ 南京大学工程管理学院，江苏南京 210046
² 西南科技大学制造科学与工程学院，四川绵阳 621010

修回日期:2020-12-04 出版日期:2020-12-15 发布日期:2020-12-01
作者简介:傅汇乔（1996- ），男，南京大学工程管理学院硕士生，主要研究方向为机器人学、强化学习。
唐开强（1992- ），男，南京大学工程管理学院博士生，主要研究方向为机器学习及随机优化等。
邓归洲（1998-），男，西南科技大学制造科学与工程学院硕士生，主要研究方向为机器视觉、机器人运动控制。
王鑫鹏（1995-），男，西南科技大学制造科学与工程学院硕士生，主要研究方向为深度学习、机器人运动控制。
陈春林（1979- ），男，博士，南京大学工程管理学院教授，主要研究方向为机器学习、智能机器人与量子控制等。
基金资助:
国家自然科学基金资助项目(71732003);国家自然科学基金资助项目(62073160);国家重点研发计划资助项目(2018AAA0101100);载人航天第四批预研项目(030602)

Motion planning for hexapod robot using deep reinforcement learning

Huiqiao FU¹, Kaiqiang TANG¹, Guizhou DENG², Xinpeng WANG², Chunlin CHEN¹

¹ School of Management and Engineering, Nanjing University, Nanjing 210046, China
² School of Manufacturing Science and Engineering, Southwest University of Science and Technology, Mianyang 621010, China

Revised:2020-12-04 Online:2020-12-15 Published:2020-12-01
Supported by:
The National Natural Science Foundation of China(71732003);The National Natural Science Foundation of China(62073160);The National Key Research and Development Program of China(2018AAA0101100);The 4th Pre-Research Project of Manned Space(030602)

摘要/Abstract

摘要：

六足机器人拥有多个冗余自由度，适用于复杂的非结构环境。离散环境作为非结构环境的一个苛刻特例，需要六足机器人具备更加高效可靠的运动策略。以平面随机梅花桩为例，设定随机起始点与目标区域，利用深度强化学习算法进行训练，并得到六足机器人在平面梅花桩环境中的运动策略。为了加快训练进程，采用具有优先经验重放机制的深度确定性策略梯度算法。最后在真实环境中进行验证，实验结果表明，所规划的运动策略能让六足机器人在平面梅花桩环境中高效平稳地从起始点运动到目标区域。为六足机器人在真实离散环境中的精确运动规划奠定了基础。

关键词: 六足机器人, 运动规划, 深度强化学习

Abstract:

Hexapod robot have multiple redundant degrees of freedom and are suitable for complex unstructured environments.Discrete environments, as a harsh special case of unstructured environments, require hexapod robots to have more efficient and reliable motion strategies.A plane random plum-blossom pile environment was taken as an example.A random starting point and a target area were set, and the deep reinforcement learning algorithm was applied to plan a motion strategy for a hexapod robot in theplane plum-blossompile environment.To speed up the training process, a deep deterministic policy gradient algorithm with a prioritized experience replay mechanism was used.Finally the policy was verified in a real environment.The results show that the planned motion strategy can make the hexapod robot move efficiently and smoothly from a starting point to a target area in aplane plum-blossom pile environment.This work lays the foundation for the precise motion planning of hexapod robots in the real discrete environment.

Key words: hexapod robot, motion planning, deep reinforcement learning

中图分类号:

TP242.6

傅汇乔, 唐开强, 邓归洲, 等. 基于深度强化学习的六足机器人运动规划[J]. 智能科学与技术学报, 2020, 2(4): 361-371.

Huiqiao FU, Kaiqiang TANG, Guizhou DENG, et al. Motion planning for hexapod robot using deep reinforcement learning[J]. Chinese Journal of Intelligent Science and Technology, 2020, 2(4): 361-371.

图/表 12

图1

表1

表2

图2

图3

图4

图5

图6

图7

图8

图9

图10

参考文献 28

[10]	BU X J . Research on robot path planning under unknown environment based on deep reinforcement learning[D]. Harbin:Harbin Institute of Technology, 2018.
[11]	孙长银, 穆朝絮 . 多智能体深度强化学习的若干关键科学问题[J]. 自动化学报, 2020(7): 1301-1312.
	SUN C Y , MU C X . Important scientific problems of multi-agent deep reinforcement learning[J]. Acta Automatica Sinica, 2020(7): 1301-1312.
[12]	GARIBALDI JONATHAN M, 陈虹宇, 李小双 . 差异与学习:模糊系统与模糊推理[J]. 智能科学与技术学报, 2019,1(4): 319-326.
	GARIBALDI JONATHAN M , CHEN H Y , LI X S . Variation and learning:fuzzy system and fuzzy inference[J]. Chinese Journal of Intelligent Science and Technology, 2019,1(4): 319-326.
[13]	陈德旺, 蔡际杰, 黄允浒 . 面向可解释性人工智能与大数据的模糊系统发展展望[J]. 智能科学与技术学报, 2019,1(4): 327-334.
	CHEN D W , CAI J J , HUANG Y H . Development prospect of fuzzy system oriented to interpretable artificial intelligence and big data[J]. Chinese Journal of Intelligent Science and Technology, 2019,1(4): 327-334.
[14]	WANG Z , CHEN C L , LI H X ,et al. Incremental reinforcement learning with prioritized sweeping for dynamic environments[J]. IEEE/ASME Transactions on Mechatronics, 2019,24(2): 621-632.
[15]	TSOUNIS V , ALGE M , LEE J ,et al. Deepgait:planning and control of quadrupedal gaits using deep reinforcement learning[J]. IEEE Robotics and Automation Letters, 2020,5(2): 3699-3706.
[16]	SHAHRIARI M , KHAYYAT A A . Gait analysis of a six-legged walking robot using fuzzy reward reinforcement learning[C]// IEEE 13th Iranian Conference on Fuzzy Systems. Piscataway:IEEE Press, 2013: 1-4.
[17]	唐开强 . 基于迁移强化学习的六足机器人步态学习研究[D]. 南京:南京大学, 2019.
	TANG K Q . Research on gait learning of hexapod robot based on transfer reinforcement learning[D]. Nanjing:Nanjing University, 2019.
[18]	HOU Y N , LIU L F , WEI Q ,et al. A novel DDPG method with prioritized experience replay[C]// 2017 IEEE International Conference on Systems,Man,and Cybernetics. Piscataway:IEEE Press, 2017: 316-321.
[19]	REN Z P , DONG D Y , LI H X ,et al. Self-paced prioritized curriculum learning with coverage penalty in deep reinforcement learning[J]. IEEE Transactions on Neural Networks and Learning Systems, 2018,29(6): 2216-2226.
[20]	SHI W J , SONG S J , WU H ,et al. Regularizedanderson acceleration for off-policy deep reinforcement learning[J]. arXiv preprint, 2019,arXiv:1909. 03245.
[21]	TING L H , BLICKHAN R , FULL R J . Dynamic and static stability in hexapedal runners[J]. Journal of Experimental Biology, 1994,197(1): 251-269.
[1]	李满宏, 张明路, 张建华 ,等. 六足机器人关键技术综述[J]. 机械设计, 2015,32(10): 1-8.
	LI M H , ZHANG M L , ZHANG J H ,et al. Review on key technology of the hexapod robot[J]. Journal of Machine Design, 2015,32(10): 1-8.
[22]	SUTTON R , BARTO A . Reinforcement learning:an introduction[M]. Cambridge: MIT Press, 1998.
[23]	WATKINS C . Learning from delayed rewards[J]. PhD thesis,University of Cambridge, 1989.
[2]	STELZER A , HIRSCHMüLLER H , G?RNER M . Stereo-visionbased navigation of a six-legged walking robot in unknown rough terrain[J]. The International Journal of Robotics Research, 2012,31(4): 381-402.
[3]	HOMBERGER T , BJELONIC M , KOTTEGE N ,et al. Terrain-dependant control of hexapod robots using vision[M]// International Symposium on Experimental Robotics. Cham: Springer, 2016: 92-102.
[24]	SILVER D , LEVER G , HEESS N ,et al. Deterministic policy gradient algorithms[C]// The 31st International Conference on Machine Learning. New York:ACM Press, 2014: 387-395.
[25]	IOFFE S , SZEGEDY C . Batch normalization:accelerating deep network training by reducing internal covariate shift[C]// Proceedings of Machine Learning Research.[S.l.:s.n.], 2015,37 448-456.
[4]	胡勇 . 六足机器人梅花桩行走步态研究[D]. 绵阳:西南科技大学, 2019.
	HU Y . Study on gait of hexapod robot walking on plum pile[D]. Mianyang:Southwest University of Science and Technology, 2019.
[26]	LILLICRAP T P , HUNT J J , PRITZEL A ,et al. Continuous control with deep reinforcement learning[J]. arXiv preprint, 2015,arXiv:1509. 02971.
[27]	SCHAUL T , QUAN J , ANTONOGLOU I ,et al. Prioritized experience replay[J]. arXiv preprint, 2015,arXiv:1511. 05952.
[5]	?í?EK P , MASRI D , FAIGL J . Foothold placement planning with a hexapod crawling robot[C]// 2017 IEEE/RSJ International Conference on Intelligent Robots and Systems. Piscataway:IEEE Press, 2017: 4096-4101.
[6]	MOSTAFA K , CHIANG J Y , WEI K T ,et al. Image-based method for determining better walking strategies for hexapods[J]. International Journal of Advanced Robotic Systems, 2015,12(5): 58.
[7]	沈宇, 韩金朋, 李灵犀 ,等. 游戏智能中的 AI——从多角色博弈到平行博弈[J]. 智能科学与技术学报, 2020,2(3): 205-213.
	SHEN Y , HAN J P , LI L X ,et al. AI in game intelligence—from multi-role game to parallel game[J]. Chinese Journal of Intelligent Science and Technology, 2020,2(3): 205-213.
[8]	王飞跃, 曹东璞, 魏庆来 . 强化学习:迈向知行合一的智能机制与算法[J]. 智能科学与技术学报, 2020,2(2): 101-106.
	WANG F Y , CAO D P , WEI Q L . Reinforcement learning:toward action-knowledge merged intelligent mechanisms and algorithms[J]. Chinese Journal of Intelligent Science and Technology, 2020,2(2): 101-106.
[9]	刘全, 翟建伟, 章宗长 ,等. 深度强化学习综述[J]. 计算机学报, 2018,41(1): 1-27.
	LIU Q , ZHAI J W , ZHANG Z Z ,et al. A survey on deep reinforcement learning[J]. Chinese Journal of Computers, 2018,41(1): 1-27.
[10]	卜祥津 . 基于深度强化学习的未知环境下机器人路径规划的研究[D]. 哈尔滨:哈尔滨工业大学, 2018.
[28]	MAHMOOD A R , HASSELT H P , SUTTON R S . Weighted importance sampling for off-policy learning with linear function approximation[C]// Conference and Workshop on Neural Information Processing Systems. New York:ACM Press, 2014: 3014-3022.

i	α_i	a_i	d_i	θ_i
1	90°	l₁	0	θ₁
2	0	l₂	0	θ₂
3	0	l₃	0	θ₃

名称	机身半径（D）	基节	大腿	小腿
长度/mm	150	50	130	140
转角范围	—	[-45°, 45°]	[0°, 45°]	[-135°, -90°]

基于深度强化学习的六足机器人运动规划

Motion planning for hexapod robot using deep reinforcement learning

在线阅读

PDF下载

可视化

摘要/Abstract

引用本文

使用本文

图/表 12

参考文献 28

相关文章 10

Metrics

推荐阅读 0

[1]	蔡莹皓, 杨华, 安璇, 王文硕, 杜沂东, 张嘉韬, 王志刚. 神经符号学及其应用研究[J]. 智能科学与技术学报, 2022, 4(4): 560-570.
[2]	马帅, 傅启明, 陈建平, 冯帆, 陆悠, 李铮伟, 裘舒年. 基于双池DQN的HVAC无模型优化控制方法[J]. 智能科学与技术学报, 2022, 4(3): 426-444.
[3]	刘家成, 张向文. 基于TD3的电动汽车复合电源能量管理策略研究[J]. 智能科学与技术学报, 2022, 4(2): 277-287.
[4]	孙宇祥, 彭益辉, 李斌, 周佳炜, 张鑫磊, 周献中. 智能博弈综述：游戏AI对作战推演的启示[J]. 智能科学与技术学报, 2022, 4(2): 157-173.
[5]	冯埔, 吴文峻, 罗杰, 于鑫, 田雍恺. 基于群体熵的机器人群体智能汇聚度量[J]. 智能科学与技术学报, 2022, 4(1): 65-74.
[6]	胡志强. 大数据智能指挥控制内在机理框架模型研究[J]. 智能科学与技术学报, 2021, 3(1): 101-109.
[7]	刘朝阳, 穆朝絮, 孙长银. 深度强化学习算法与应用研究现状综述[J]. 智能科学与技术学报, 2020, 2(4): 314-326.
[8]	李涛, 魏庆来. 基于深度强化学习的智能暖气温度控制系统[J]. 智能科学与技术学报, 2020, 2(4): 348-353.
[9]	王日中, 李慧平, 崔迪, 徐德民. 基于深度强化学习算法的自主式水下航行器深度控制[J]. 智能科学与技术学报, 2020, 2(4): 354-360.
[10]	沈宇,韩金朋,李灵犀,王飞跃. 游戏智能中的AI——从多角色博弈到平行博弈[J]. 智能科学与技术学报, 2020, 2(3): 205-213.