通信学报 ›› 2023, Vol. 44 ›› Issue (10): 213-225.doi: 10.11959/j.issn.1000-436x.2023194

• 学术通信 • 上一篇    

融合多尺度深度卷积的轻量级Transformer交通场景语义分割算法

谢刚1,2, 王荃毅1,2, 谢新林1,2, 王健安1,2   

  1. 1 太原科技大学电子信息工程学院,山西 太原 030024
    2 先进控制与装备智能化山西省重点实验室,山西 太原 030024
  • 修回日期:2023-09-20 出版日期:2023-10-01 发布日期:2023-10-01
  • 作者简介:谢刚(1972− ),男,山西五台人,博士,太原科技大学教授、博士生导师,主要研究方向为计算机视觉、智能控制等
    王荃毅(1999− ),男,山西长治人,太原科技大学硕士生,主要研究方向为语义分割、深度学习等
    谢新林(1990− ),男,山西运城人,博士,太原科技大学副教授、硕士生导师,主要研究方向为语义分割、深度学习等
    王健安(1984− ),男,江西九江人,博士,太原科技大学教授、硕士生导师,主要研究方向为智能信息系统、复杂网络控制等
  • 基金资助:
    国家自然科学基金资助项目(62006169);山西省重点研发计划基金资助项目(202202010101005);太原科技大学博士科研启动基金资助项目(20192047)

Lightweight Transformer traffic scene semantic segmentation algorithm integrating multi-scale depth convolution

Gang XIE1,2, Quanyi WANG1,2, Xinlin XIE1,2, Jian’an WANG1,2   

  1. 1 School of Electronic and Information Engineering, Taiyuan University of Science and Technology, Taiyuan 030024, China
    2 Shanxi Key Laboratory of Advanced Control and Equipment Intelligence, Taiyuan 030024,China
  • Revised:2023-09-20 Online:2023-10-01 Published:2023-10-01
  • Supported by:
    The National Natural Science Foundation of China(62006169);Key Research and Development Plan of Shanxi Province(202202010101005);Taiyuan University of Science and Technology Scientific Research Initial Funding(20192047)

摘要:

针对交通场景语义分割算法中存在的易融入周围背景的纤细条状目标分割不连续、模型参数量大等问题,提出一种融合多尺度深度卷积的轻量级 Transformer 交通场景语义分割算法。首先,基于深度卷积构建多尺度条形特征提取模块,在不同尺度下增强对纤细条状目标特征的表示能力。其次,在浅层网络中利用卷积归纳偏置特性设计空间细节辅助模块,以弥补深层空间细节信息的丢失来优化目标边缘分割。最后,提出基于Transformer-CNN框架的非对称编解码网络,编码器结合Transformer与CNN减少细节信息丢失并降低模型参数量;而解码器采用轻量级的多级特征融合设计来进一步建模全局上下文。所提算法在Cityscapes和CamVid交通场景公开数据集上分别取得的平均交并比为 78.63%和 81.06%,能够在交通场景语义分割中实现分割精度和模型大小之间的权衡,具备良好的应用前景。

关键词: 语义分割, 深度学习, 注意力机制, 轻量级, 交通场景

Abstract:

Aiming at the problems of discontinuous segmentation of thin strip objects that were easy to blend into the surrounding background and a large number of model parameters in the semantic segmentation algorithm of traffic scenes, a lightweight Transformer traffic scene semantic segmentation algorithm integrating multi-scale depth convolution was proposed.First, a multi-scale strip feature extraction module (MSEM) was constructed based on deep convolution to enhance the representation ability of thin strip target features at different scales.Secondly, a spatial detail auxiliary module (SDAM) was designed using the convolutional inductive bias feature in the shallow network to compensate for the loss of deep spatial detail information to optimize object edge segmentation.Finally, an asymmetric encoding-decoding network based on the Transformer-CNN framework (TC-AEDNet) was proposed.The encoder combined Transformer and CNN to alleviate the loss of detail information and reduce the amount of model parameters; while the decoder adopted a lightweight multi-level feature fusion design to further model the global context.The proposed algorithm achieves the mean intersection over union (mIoU) of 78.63% and 81.06% respectively on the Cityscapes and CamVid traffic scene public datasets.It can achieve a trade-off between segmentation accuracy and model size in traffic scene semantic segmentation and has a good application prospect.

Key words: semantic segmentation, deep learning, attention mechanism, lightweight, traffic scene

中图分类号: 

No Suggested Reading articles found!