智能科学与技术学报 ›› 2022, Vol. 4 ›› Issue (2): 223-232.doi: 10.11959/j.issn.2096-6652.202225

• 专题:自主智能体灵巧精准操作学习 • 上一篇    下一篇

基于改进TD3算法的机械臂智能规划方法研究

张强, 文闻, 周晓东, 刘维惠, 初晓昱   

  1. 北京控制工程研究所,北京 100190
  • 出版日期:2022-06-15 发布日期:2022-06-01
  • 作者简介:张强(1978− ),男,博士,北京控制工程研究所机电技术事业部部长,中国航天科技集团有限公司科技委惯性专业组成员、研究员,主要研究方向为航天器各类驱动机构、执行机构、空间操控机械臂等
    文闻(1985− ),男,博士,北京控制工程研究所高级工程师,主要研究方向为空间机械臂运动规划与智能操控技术
    周晓东(1983− ),男,博士,北京控制工程研究所高级工程师,主要研究方向为仿生机器人、机器人交互控制技术
    刘维惠(1988− ),女,博士,北京控制工程研究所高级工程师,主要研究方向为空间机械臂智能交互技术
    初晓昱(1992− ),女,博士,北京控制工程研究所工程师,主要研究方向为空间机械臂智能规划、地外行星探测机器人控制技术
  • 基金资助:
    科技创新2030—“新一代人工智能”重大项目(2018AAA0103004)

Research on the manipulator intelligent trajectory planning method based on the improved TD3 algorithm

Qiang ZHANG, Wen WEN, Xiaodong ZHOU, Weihui LIU, Xiaoyu CHU   

  1. Beijing Institute of Control Engineering, Beijing 100190, China
  • Online:2022-06-15 Published:2022-06-01
  • Supported by:
    The National Key Research and Development Program of China(2018AAA0103004)

摘要:

针对某卫星搭载的4自由度机械臂轨迹规划问题,提出了一种基于改进的双延迟深度确定性策略梯度(TD3)算法的智能规划方法。该方法采用分阶段训练策略,在预训练阶段,采用了目标位置引导联合TD3算法进行轨迹优化的混合规划策略,训练结束后规划算法能够在机械臂关节空间对任意起点、终点进行速度轨迹的自主规划。这种目标引导机制减少了训练时不必要的探索,在一定程度上解决了高维动作空间中学习效率低下的问题。在二次训练阶段,首先通过示教获得一条无碰撞的安全参考轨迹,然后在训练过程中不断对这条轨迹进行模仿,使得最终算法输出的轨迹具备避障能力。

关键词: 避障规划, 目标引导, 轨迹示教, 双重训练

Abstract:

An intelligent trajectory planning and obstacle avoidance method based on the improved twin delayed deep deterministic policy gradient algorithm (TD3) was proposed to solve the trajectory planning problem for a 4-DOF manipulator mounted on a satellite.The training strategy had 2 periods.In the pre-training stage, the target position was always guided combining with the output of the strategy network to optimize the trajectory.After the pre-training, the algorithm can autonomously output the velocity trajectory while the initial position and the target were specified randomly in the joint space of the manipulator.This target-guided mechanism decreased the unnecessary explorations and improved the learning efficiency in high dimensional action space.In the second training stage, a collision-free safety reference trajectory was firstly obtained by demonstration, and then this trajectory was constantly learned during the training process until the final output trajectory has the ability to avoid obstacles.

Key words: obstacle avoidance planning, target-guide, trajectory demonstration, double training

中图分类号: 

No Suggested Reading articles found!