融合多尺度深度卷积的轻量级Transformer交通场景语义分割算法

doi:10.11959/j.issn.1000-436x.2023194

Abstract

Abstract:

Aiming at the problems of discontinuous segmentation of thin strip objects that were easy to blend into the surrounding background and a large number of model parameters in the semantic segmentation algorithm of traffic scenes, a lightweight Transformer traffic scene semantic segmentation algorithm integrating multi-scale depth convolution was proposed.First, a multi-scale strip feature extraction module (MSEM) was constructed based on deep convolution to enhance the representation ability of thin strip target features at different scales.Secondly, a spatial detail auxiliary module (SDAM) was designed using the convolutional inductive bias feature in the shallow network to compensate for the loss of deep spatial detail information to optimize object edge segmentation.Finally, an asymmetric encoding-decoding network based on the Transformer-CNN framework (TC-AEDNet) was proposed.The encoder combined Transformer and CNN to alleviate the loss of detail information and reduce the amount of model parameters; while the decoder adopted a lightweight multi-level feature fusion design to further model the global context.The proposed algorithm achieves the mean intersection over union (mIoU) of 78.63% and 81.06% respectively on the Cityscapes and CamVid traffic scene public datasets.It can achieve a trade-off between segmentation accuracy and model size in traffic scene semantic segmentation and has a good application prospect.

Key words: semantic segmentation, deep learning, attention mechanism, lightweight, traffic scene

CLC Number:

TP391.4

Gang XIE, Quanyi WANG, Xinlin XIE, Jian’an WANG. Lightweight Transformer traffic scene semantic segmentation algorithm integrating multi-scale depth convolution[J]. Journal on Communications, 2023, 44(10): 213-225.

Figures/Tables 14

References 27

[1]	周鑫, 何晓新, 郑昌文 . 基于图像深度学习的无线电信号识别[J]. 通信学报, 2019,40(7): 114-125.
	ZHOU X , HE X X , ZHENG C W . Radio signal recognition based on image deep learning[J]. Journal on Communications, 2019,40(7): 114-125.
[2]	LV Q X , SUN X , CHEN C R ,et al. Parallel complement network for real-time semantic segmentation of road scenes[J]. IEEE Transactions on Intelligent Transportation Systems, 2022,23(5): 4432-4444.
[3]	李琳辉, 钱波, 连静 ,等. 基于卷积神经网络的交通场景语义分割方法研究[J]. 通信学报, 2018,39(4): 123-130.
	LI L H , QIAN B , LIAN J ,et al. Study on traffic scene semantic segmentation method based on convolutional neural network[J]. Journal on Communications, 2018,39(4): 123-130.
[4]	杨军, 党吉圣 . 基于上下文注意力 CNN 的三维点云语义分割[J]. 通信学报, 2020,41(7): 195-203.
	YANG J , DANG J S . Semantic segmentation of 3D point cloud based on contextual attention CNN[J]. Journal on Communications, 2020,41(7): 195-203.
[5]	CHEN B K , GONG C , YANG J . Importance-aware semantic segmentation for autonomous vehicles[J]. IEEE Transactions on Intelligent Transportation Systems, 2019,20(1): 137-148.
[6]	CHEN L C , PAPANDREOU G , KOKKINOS I ,et al. DeepLab:semantic image segmentation with deep convolutional nets,atrous convolution,and fully connected CRFs[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2018,40(4): 834-848.
[7]	DONG G S , YAN Y , SHEN C H ,et al. Real-time high-performance semantic image segmentation of urban street scenes[J]. IEEE Transactions on Intelligent Transportation Systems, 2021,22(6): 3258-3274.
[8]	CHEN L C , ZHU Y , PAPANDREOU G ,et al. Encoder-decoder with atrous separable convolution for semantic image segmentation[C]// Proceedings of the European Conference on Computer Vision (ECCV). Berlin:Springer, 2018: 801-818.
[9]	HUYNH C , TRAN A T , LUU K ,et al. Progressive semantic segmentation[C]// Proceedings of IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). Piscataway:IEEE Press, 2021: 16750-16759.
[10]	WENG X , YAN Y , CHEN S ,et al. Stage-aware feature alignment network for real-time semantic segmentation of street scenes[J]. IEEE Transactions on Circuits and Systems for Video Technology, 2022,32(7): 4444-4459.
[11]	DOSOVITSKIY A , BEYER L , KOLESNIKOV A ,et al. An image is worth 16×16 words:transformers for image recognition at scale[J]. arXiv Preprint,arXiv:2010.11929, 2020.
[12]	LIU Z , LIN Y T , CAO Y ,et al. Swin transformer:hierarchical vision transformer using shifted windows[C]// Proceedings of IEEE/CVF International Conference on Computer Vision (ICCV). Piscataway:IEEE Press, 2022: 9992-10002.
[13]	WANG W H , XIE E Z , LI X ,et al. Pyramid vision transformer:a versatile backbone for dense prediction without convolutions[C]// Proceedings of IEEE/CVF International Conference on Computer Vision (ICCV). Piscataway:IEEE Press, 2022: 548-558.
[14]	PENG Z L , HUANG W , GU S Z ,et al. Conformer:local features coupling global representations for visual recognition[C]// Proceedings of IEEE/CVF International Conference on Computer Vision (ICCV). Piscataway:IEEE Press, 2022: 357-366.
[15]	HU X G , JING L Y , SEHAR U . Joint pyramid attention network for real-time semantic segmentation of urban scenes[J]. Applied Intelligence, 2022,52(1): 580-594.
[16]	XIAO X , ZHAO Y , ZHANG F ,et al. BASeg:Boundary aware semantic segmentation for autonomous driving[J]. Neural Networks, 2023,157: 460-470.
[17]	RONNEBERGER O , FISCHER P , BROX T . U-Net:convolutional networks for biomedical image segmentation[C]// International Conference on Medical Image Computing and Computer-Assisted Intervention. Berlin:Springer, 2015: 234-241.
[18]	BADRINARAYANAN V , KENDALL A , CIPOLLA R . SegNet:a deep convolutional encoder-decoder architecture for image segmentation[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2017,39(12): 2481-2495.
[19]	BAI W . An ENet semantic segmentation method combined with attention mechanism[J]. Computational Intelligence and Neuroscience, 2023,2023: 1-9.
[20]	PAN H H , HONG Y D , SUN W C ,et al. Deep dual-resolution networks for real-time and accurate semantic segmentation of traffic scenes[J]. IEEE Transactions on Intelligent Transportation Systems, 2023,24(3): 3448-3460.
[21]	LIU Z , MAO H Z , WU C Y ,et al. A ConvNet for the 2020s[C]// Proceedings of IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). Piscataway:IEEE Press, 2022: 11966-11976.
[22]	WOO S , PARK J , LEE J Y ,et al. CBAM:convolutional block attention module[C]// Proceedings of Computer Vision - ECCV 2018:15th European Conference. New York:ACM Press, 2018: 3-19.
[23]	WU Z F , SHEN C H , VAN DEN HENGEL A . Wider or deeper:revisiting the ResNet model for visual recognition[J]. Pattern Recognition, 2019,90(C): 119-133.
[24]	XIE E , WANG W , YU Z ,et al. SegFormer:simple and efficient design for semantic segmentation with transformers[J]. Advances in Neural Information Processing Systems, 2021,34: 12077-12090.
[25]	BROSTOW G J , FAUQUEUR J , CIPOLLA R . Semantic object classes in video:a high-definition ground truth database[J]. Pattern Recognition Letters, 2009,30(2): 88-97.
[26]	CORDTS M , OMRAN M , RAMOS S ,et al. The cityscapes dataset for semantic urban scene understanding[C]// Proceedings of IEEE Conference on Computer Vision and Pattern Recognition (CVPR). Piscataway:IEEE Press, 2016: 3213-3223.
[27]	LONG J , SHELHAMER E , DARRELL T . Fully convolutional networks for semantic segmentation[C]// Proceedings of IEEE Conference on Computer Vision and Pattern Recognition (CVPR). Piscataway:IEEE Press, 2015: 3431-3440.

Metrics

Recommended 0

No Suggested Reading articles found!

MDC	Scale_i	Input	s	k	p
MDC₁	i=1	320×64×64	1	1×7	0×3
				7×1	3×0
	i=2	320×64×64	1	1×9	0×4
				9×1	4×0
	i=3	320×64×64	1	1×11	0×5
				11×1	5×0
MDC₂	i=1	320×64×64	1	1×7	0×3
				7×1	3×0
	i=2	320×64×64	1	1×9	0×4
				9×1	4×0
	i=3	512×64×64	1	1×11	0×5
				11×1	5×0

数据集	图像大小	类别	训练集/幅	测试集/幅	验证集/幅
CamVid	960像素×720像素	11	367	100	233
Cityscapcs	2 048像素×1 024像素	19	2 975	500	1 525

算法	编码器	参数量	PA	mPA	mIoU
SegFormer-B2	Stage1～Stage4	27.36×10⁶	96.06%	86.15%	77.39%
	Stage4	24.56×10⁶	95.74%	81.83%	73.46%
TC-AEDNet	Stage1～Stage3+ASPP	15.31×10⁶	95.83%	82.70%	74.40%
	Stage1～Stage3+MSEM	18.55×10⁶	96.18%	84.53%	77.49%

实验序号	注意力机制	参数量	PA	mPA	mIoU
1	无	18.15×10⁶	96.00%	82.58%	75.10%
2	SAM	18.53×10⁶	95.17%	84.05%	75.57%
3	CAM	18.15×10⁶	96.06%	84.11%	76.37%
4	CBAM	18.55×10⁶	96.18%	84.53%	77.49%

算法	参数量	PA	mPA	mIoU
SegFormer-B2	27.36×10⁶	96.06%	86.15%	77.39%
SegFormer-B2+SDAM	27.45×10⁶	96.14% (↑0.08%)	86.64% (↑0.49%)	78.43% (↑1.04%)
TC-AEDNet	18.56×10⁶	96.05%	86.09%	77.91%
TC-AEDNet+SDAM	18.65×10⁶	96.16% (↑0.11%)	87.21% (↑1.12%)	78.63% (↑0.72%)

Lightweight Transformer traffic scene semantic segmentation algorithm integrating multi-scale depth convolution

RichHTML

PDF

Knowledge

Abstract

Cite this article

share this article

Figures/Tables 14

References 27

Related Articles 15

Metrics

Recommended 0

特征融合	参数量	PA	mPA	mIoU
Stage1～Stage4	18.55×10⁶	96.18%	84.53%	77.49%
Stage[1, 2, 3-4]	18.56×10⁶	96.05%	86.09%	77.91%

序号	算法	参数量	PA	mPA	mIoU
1	FCN	18.64×10⁶	94.15%	63.12%	55.32%
2	SegNet	29.45×10⁶	93.53%	64.49%	57.01%
3	MagNet	6.37×10⁶	94.23%	69.71%	68.20%
4	RtHp	6.20×10⁶	94.95%	78.82%	73.67%
5	JPANet	3.49×10⁶	94.53%	78.42%	72.43%
6	DeepLab-V3+	58.75×10⁶	95.67%	83.49%	75.04%
7	Swin-B	88.23×10⁶	96.30%	85.74%	78.54%
8	SegFormer-B2	27.36×10⁶	96.06%	86.15%	77.91%
9	TC-AEDNet+SDAM	18.65×10⁶	96.16%	87.21%	78.63%

[1]	Xuejun ZHANG, Fenghe ZHANG, Jiyang GAI, Xiaogang DU, Wenjie ZHOU, Teli CAI, Bo ZHAO. mVulSniffer: a multi-type source code vulnerability sniffer method [J]. Journal on Communications, 2023, 44(9): 149-160.
[2]	Zhiyuan LI, Binglei XU, Yingyi ZHOU. Graph neural network-based address classification method for account balance model blockchain [J]. Journal on Communications, 2023, 44(9): 115-126.
[3]	Mian LI, Yang LI, Zonghui ZHANG, Qingjiang SHI. Communication-efficient distributed precoding design for Massive MIMO [J]. Journal on Communications, 2023, 44(8): 37-48.
[4]	Huijiao WANG, Xin ZHANG, Yongzhuang WEI, Lingchen LI. Novel distinguisher for SM4 cipher algorithm based on deep learning [J]. Journal on Communications, 2023, 44(7): 171-184.
[5]	Rongpeng LI, Bingyan WANG, Honggang ZHANG, Zhifeng ZHAO. Design of knowledge enhanced semantic communication receiver [J]. Journal on Communications, 2023, 44(6): 70-76.
[6]	Dongyu CHEN, Hua CHEN, Limin FAN, Yifang FU, Jian WANG. Research on test strategy for randomness based on deep learning [J]. Journal on Communications, 2023, 44(6): 23-33.
[7]	Shuai MA, Ke PEI, Huayan QI, Hang LI, Wen CAO, Hongmei WANG, Hailiang XIONG, Shiyin LI. Research on geomagnetic indoor high-precision positioning algorithm based on generative model [J]. Journal on Communications, 2023, 44(6): 211-222.
[8]	Bin LU, Yang SUN, Zhenyu YANG. Grid self-attention mechanism 3D object detection method based on raw point cloud [J]. Journal on Communications, 2023, 44(10): 72-84.
[9]	Shuai LIU, Jie GUAN, Bin HU, Sudong MA. Differential analysis of lightweight cipher algorithm ACE based on MILP [J]. Journal on Communications, 2023, 44(1): 39-48.
[10]	Jie YANG, Biao DONG, Xue FU, Yu WANG, Guan GUI. Lightweight decentralized learning-based automatic modulation classification method [J]. Journal on Communications, 2022, 43(7): 134-142.
[11]	Zhenyu WANG, Yang GUO, Shaoqing LI, Shen HOU, Ding DENG. Design of efficient anonymous identity authentication protocol for lightweight IoT devices [J]. Journal on Communications, 2022, 43(7): 49-61.
[12]	Xiuzhang YANG, Guojun PENG, Zichuan LI, Yangqi LYU, Side LIU, Chenguang LI. Research on entity recognition and alignment of APT attack based on Bert and BiLSTM-CRF [J]. Journal on Communications, 2022, 43(6): 58-70.
[13]	Yurong LIAO, Haining WANG, Cunbao LIN, Yang LI, Yuqiang FANG, Shuyan NI. Research progress of deep learning-based object detection of optical remote sensing image [J]. Journal on Communications, 2022, 43(5): 190-203.
[14]	Yong LIAO, Shiyi WANG. CSI feedback algorithm based on RM-Net for massive MIMO systems in high-speed mobile environment [J]. Journal on Communications, 2022, 43(5): 166-176.
[15]	Xinchun YIN, Mengyu WANG, Jianting NING. Lightweight searchable medical data sharing scheme [J]. Journal on Communications, 2022, 43(5): 110-122.

序号	算法	参数量	PA	mPA	mIoU
1	FCN^[13]	18.64×10⁶	91.89%	74.05%	64.75%
2	SegNet^[15]	29.45×10⁶	89.38%	74.25%	65.60%
3	RtHp^[7]	6.2×10⁶	93.86%	78.87%	68.14%
4	JPANet^[25]	3.49×10⁶	93.44%	78.27%	67.45%
5	DeepLab-V3+^[8]	58.75×10⁶	94.22%	85.16%	78.03%
6	SegFormer-B2^[18]	27.36×10⁶	95.11%	87.34%	80.55%
7	TC-AEDNet	18.65×10⁶	96.47%	87.64%	81.06%