基于改进TD3算法的机械臂智能规划方法研究

doi:10.11959/j.issn.2096-6652.202225

Abstract

Abstract:

An intelligent trajectory planning and obstacle avoidance method based on the improved twin delayed deep deterministic policy gradient algorithm (TD3) was proposed to solve the trajectory planning problem for a 4-DOF manipulator mounted on a satellite.The training strategy had 2 periods.In the pre-training stage, the target position was always guided combining with the output of the strategy network to optimize the trajectory.After the pre-training, the algorithm can autonomously output the velocity trajectory while the initial position and the target were specified randomly in the joint space of the manipulator.This target-guided mechanism decreased the unnecessary explorations and improved the learning efficiency in high dimensional action space.In the second training stage, a collision-free safety reference trajectory was firstly obtained by demonstration, and then this trajectory was constantly learned during the training process until the final output trajectory has the ability to avoid obstacles.

Key words: obstacle avoidance planning, target-guide, trajectory demonstration, double training

CLC Number:

TP183

Qiang ZHANG, Wen WEN, Xiaodong ZHOU, et al. Research on the manipulator intelligent trajectory planning method based on the improved TD3 algorithm[J]. Chinese Journal of Intelligent Science and Technology, 2022, 4(2): 223-232.

Figures/Tables 13

References 25

[1]	胡照, 王敏, 袁俊刚 . 国外全电推进卫星平台的发展及启示[J]. 航天器环境工程, 2015,32(5): 566-570.
	HU Z , WANG M , YUAN J G . A review of the development of all-electric propulsion platform in the world[J]. Spacecraft Environment Engineering, 2015,32(5): 566-570.
[2]	DONALD B , LYNCH K , RUS D . Algorithmic and computational robotics[M].[S.l.]: A K Peters/CRC Press, 2001.
[3]	KAVRAKI L E , SVESTKA P , LATOMBE J C ,et al. Probabilistic roadmaps for path planning in high-dimensional configuration spaces[J]. IEEE Transactions on Robotics and Automation, 1996,12(4): 566-580.
[4]	AMARJYOTI S . Deep reinforcement learning for robotic manipulation-the state of the art[J]. arXiv preprint,2017,arXiv:1701.08878.
[5]	多南讯, 吕强, 林辉灿 ,等. 迈进高维连续空间:深度强化学习在机器人领域中的应用[J]. 机器人, 2019,41(2): 276-288.
	DUO N X , LYU Q , LIN H C ,et al. Step into high-dimensional and continuous action space:a survey on applications of deep reinforcement learning to robotics[J]. Robot, 2019,41(2): 276-288.
[6]	GU S X , HOLLY E , LILLICRAP T ,et al. Deep reinforcement learning for robotic manipulation with asynchronous off-policy updates[C]// Proceedings of 2017 IEEE International Conference on Robotics and Automation. Piscataway:IEEE Press, 2017: 3389-3396.
[7]	ZHU Y K , WANG Z Y , MEREL J ,et al. Reinforcement and imitation learning for diverse visuomotor skills[J]. arXiv preprint,2018,arXiv:1802.09564.
[8]	SANGIOVANNI B , INCREMONA G P , PIASTRA M ,et al. Self-configuring robot path planning with obstacle avoidance via deep reinforcement learning[J]. IEEE Control Systems Letters, 2021,5(2): 397-402.
[9]	SANGIOVANNI B , RENDINIELLO A , INCREMONA G P ,et al. Deep reinforcement learning for collision avoidance of robotic manipulators[C]// Proceedings of 2018 European Control Conference. Piscataway:IEEE Press, 2018.
[10]	KURUTACH T , CLAVERA I , DUAN Y ,et al. Model-ensemble trust-region policy optimization[J]. arXiv preprint,2018,arXiv:1802.10592.
[11]	SCHULMAN J , WOLSKI F , DHARIWAL P ,et al. Proximal policy optimization algorithms[J]. arXiv preprint,2017,arXiv:1707.06347.
[12]	ZHANG Y Z , CLAVERA I , TSAI B ,et al. Asynchronous methods for model-based reinforcement learning[J]. arXiv preprint,2019,arXiv:1910.12453.
[13]	SILVER D , LEVER G , HEESS N ,et al. Deterministic policy gradient algorithms[C]// Proceedings of the 31st International Conference on Machine Learning.[S.l.:s.n.], 2014: 387-395.
[14]	LILLICRAP T P , HUNT J J , PRITZEL A ,et al. Continuous control with deep reinforcement learning[J]. arXiv preprint,2015,arXiv:1509.02971.
[15]	BARTH-MARON G , HOFFMAN M W , BUDDEN D ,et al. Distributed distributional deterministic policy gradients[J]. arXiv preprint,2018,arXiv:1804.08617.
[16]	FUJIMOTO S , VAN HOOF H , MEGER D . Addressing function approximation error in actor-critic methods[J]. arXiv preprint,2018,arXiv:1802.09477.
[17]	TASSA Y , DORON Y , MULDAL A ,et al. DeepMind control suite[J]. arXiv preprint,2018,arXiv:1801.00690.
[18]	QURESHI A H , SIMEONOV A , BENCY M J ,et al. Motion planning networks[C]// Proceedings of 2019 International Conference on Robotics and Automation. Piscataway:IEEE Press, 2019: 2118-2124.
[19]	ZHONG J , WANG T , CHENG L L . Collision-free path planning for welding manipulator via hybrid algorithm of deep reinforcement learning and inverse kinematics[J]. Complex ＆ Intelligent Systems, 2021: 1-14.
[20]	MIN C H , SONG J B . End-to-end robot manipulation using demonstration-guided goal strategie[C]// Proceedings of 2019 16th International Conference on Ubiquitous Robots. Piscataway:IEEE Press, 2019: 159-164.
[21]	XU J , HOU Z M , WANG W ,et al. Feedback deep deterministic policy gradient with fuzzy reward for robotic multiple peg-in-hole assembly tasks[C]// Proceedings of the IEEE Transactions on Industrial Informatics. Piscataway:IEEE Press, 2019: 1658-1667.
[22]	ZHU Z Y , HU H S . Robot learning from demonstration in robotic assembly:a survey[J]. Robotics, 2018,7(2): 17.
[23]	刘维惠, 陈殿生, 张立志 . 利用示教学习的移动机械臂轨迹避障算法[J]. 哈尔滨工程大学学报, 2018,39(9): 1546-1553.
	LIU W H , CHEN D S , ZHANG L Z . Learning from demonstration based obstacle avoidance algorithm to plan the trajectory of a mobile manipulator[J]. Journal of Harbin Engineering University, 2018,39(9): 1546-1553.
[24]	ABDO N , KRETZSCHMAR H , SPINELLO L ,et al. Learning manipulation actions from a few demonstrations[C]// Proceedings of 2013 IEEE International Conference on Robotics and Automation. Piscataway:IEEE Press, 2013: 1268-1275.
[25]	HEESS N , TB D , SRIRAM S ,et al. Emergence of locomotion behaviours in rich environments[J]. arXiv preprint,2017,arXiv:1707.02286.

Metrics

Recommended 0

No Suggested Reading articles found!

Research on the manipulator intelligent trajectory planning method based on the improved TD3 algorithm

RichHTML

PDF下载

Knowledge

Abstract

Cite this article

share this article

Figures/Tables 13

References 25

Related Articles 7

Metrics

Recommended 0

[1]	Zhe HUANG, Yongcai WANG, Deying LI. A survey of 3D object detection algorithms [J]. Chinese Journal of Intelligent Science and Technology, 2023, 5(1): 7-31.
[2]	Yan CHEN, Xueqin LUO, Wei LIANG, Yongfang XIE. Depression recognition based on emotional information fused with attentional mechanism [J]. Chinese Journal of Intelligent Science and Technology, 2022, 4(4): 600-609.
[3]	Ganxiong ZENG, Xiao KE. 3D convolution-based image sequence feature extraction and self-attention for license plate recognition method [J]. Chinese Journal of Intelligent Science and Technology, 2021, 3(3): 268-279.
[4]	Yi HONG, Chengli SUN, Yan LENG. End-to-end speech enhancement based on ultra-lightweight channel attention [J]. Chinese Journal of Intelligent Science and Technology, 2021, 3(3): 351-358.
[5]	Li LI, Weiliang ZENG, Yonghui HUANG, Weijun SUN. Research on anti-spoofing method of face recognition based on semi-supervised learning [J]. Chinese Journal of Intelligent Science and Technology, 2021, 3(3): 370-380.
[6]	Guobin ZHANG,Xinying WANG. Research on data-driven modeling for photovoltaic characteristics based on hybrid neural network [J]. Chinese Journal of Intelligent Science and Technology, 2020, 2(2): 169-178.
[7]	Yating WEI,Zhiyong WANG,Shuyue ZHOU,Wei CHEN. Federated visualization:a new model for privacy-preserving visualization [J]. Chinese Journal of Intelligent Science and Technology, 2019, 1(4): 415-420.