Journal on Communications ›› 2023, Vol. 44 ›› Issue (10): 213-225.doi: 10.11959/j.issn.1000-436x.2023194

• Correspondences • Previous Articles    

Lightweight Transformer traffic scene semantic segmentation algorithm integrating multi-scale depth convolution

Gang XIE1,2, Quanyi WANG1,2, Xinlin XIE1,2, Jian’an WANG1,2   

  1. 1 School of Electronic and Information Engineering, Taiyuan University of Science and Technology, Taiyuan 030024, China
    2 Shanxi Key Laboratory of Advanced Control and Equipment Intelligence, Taiyuan 030024,China
  • Revised:2023-09-20 Online:2023-10-01 Published:2023-10-01
  • Supported by:
    The National Natural Science Foundation of China(62006169);Key Research and Development Plan of Shanxi Province(202202010101005);Taiyuan University of Science and Technology Scientific Research Initial Funding(20192047)

Abstract:

Aiming at the problems of discontinuous segmentation of thin strip objects that were easy to blend into the surrounding background and a large number of model parameters in the semantic segmentation algorithm of traffic scenes, a lightweight Transformer traffic scene semantic segmentation algorithm integrating multi-scale depth convolution was proposed.First, a multi-scale strip feature extraction module (MSEM) was constructed based on deep convolution to enhance the representation ability of thin strip target features at different scales.Secondly, a spatial detail auxiliary module (SDAM) was designed using the convolutional inductive bias feature in the shallow network to compensate for the loss of deep spatial detail information to optimize object edge segmentation.Finally, an asymmetric encoding-decoding network based on the Transformer-CNN framework (TC-AEDNet) was proposed.The encoder combined Transformer and CNN to alleviate the loss of detail information and reduce the amount of model parameters; while the decoder adopted a lightweight multi-level feature fusion design to further model the global context.The proposed algorithm achieves the mean intersection over union (mIoU) of 78.63% and 81.06% respectively on the Cityscapes and CamVid traffic scene public datasets.It can achieve a trade-off between segmentation accuracy and model size in traffic scene semantic segmentation and has a good application prospect.

Key words: semantic segmentation, deep learning, attention mechanism, lightweight, traffic scene

CLC Number: 

No Suggested Reading articles found!