基于多模态特征的无监督领域自适应多级对抗语义分割网络

doi:10.11959/j.issn.1000-436x.2022212

Abstract

Abstract:

In order to solve the problem of the distribution differences of visual, spatial, and semantic features between domains in domain adaptation, an unsupervised domain adaptation multi-level adversarial network for semantic segmentation based on multi-modal features was proposed.Firstly, an attentive fusion semantic segmentation network with three-layer structure was designed to learn the above three types of features from the source domain and target domain, respectively.Secondly, a self-supervised learning method jointing distribution confidence and semantic confidence was introduced into the single-level adversarial learning, so as to achieve the distribution alignment of more target domain pixels in the process of minimizing the distribution distance of the learnt features between domains.Finally, three adversarial branches and three adaptive sub-networks were jointly optimized by the multi-level adversarial learning method based on multi-modal features, which could effectively learn the invariant representation between domains for the features extracted from each sub-network.The experimental results show that compared with existing state-of-the-art methods, on the datasets of GTA5 to Cityscapes, SYNTHIA to Cityscapes, and SUN-RGBD to NYUD-v2 the proposed network achieves the best mean intersection over union of 62.2%, 66.9%, and 59.7%, respectively.

Key words: unsupervised domain adaptation, semantic segmentation, multi-modal features, attentive fusion, multi-level adversarial learning, self-supervised learning

CLC Number:

TP391

Zeyu WANG, Shuhui BU, Wei HUANG, Yuanpan ZHENG, Qinggang WU, Huawen CHANG, Xu ZHANG. Unsupervised domain adaptation multi-level adversarial network for semantic segmentation based on multi-modal features[J]. Journal on Communications, 2022, 43(12): 157-171.

Figures/Tables 13

方法	mIoU
G^HVF(不包含自监督学习的单级对抗训练)	45.9%
G^HVF+G^SSF(不包含自监督学习的单级对抗训练)	49.2%
G^HVF+G^SSF+G^MHF(不包含自监督学习的单级对抗训练)	50.4%
G^HVF+G^SSF+G^MHF(包含改进自监督学习的单级对抗训练)	57.5%
UDAMAN-MF (包含改进自监督学习的多级对抗训练)	$62 . 2 %$

方法	ε	抗攻击训练	mIoU	mIoU drop	mIoU*
BDL^[13]		—	36.2%	12.3%	48.5%
DPL^[15]		—	41.6%	11.7%	53.3%
ProCA^[30]	0.1	—	39.9%	16.4%	56.3%
ASSUDA^[32]		?	$\underline{43 . 3} %$	$0 . 6 %$	43.9%
UDAMAN-MF		—	$52 . 0 %$	$\underline{10 . 2} %$	62.2%
BDL^[13]		—	19.9%	28.6%	48.5%
DPL^[15]		—	26.4%	26.9%	53.3%
ProCA^[30]	0.25	—	25.1%	31.2%	56.3%
ASSUDA^[32]		?	$39 . 0 %$	$4 . 9 %$	43.9%
UDAMAN-MF		—	$\underline{37 . 7} %$	$\underline{24 . 5} %$	62.2%
BDL^[13]		—	6.5%	42.0%	48.5%
DPL^[15]		—	12.4%	40.9%	53.3%
ProCA^[30]	0.5	—	11.6%	44.7%	56.3%
ASSUDA^[32]		?	$27 . 4 %$	$16 . 5 %$	43.9%
UDAMAN-MF		—	$\underline{23 . 6} %$	$\underline{38 . 6} %$	62.2%

方法	训练方法	mIoU(13)	mIoU*(16)
BDL^[13]	AST	51.4%	—
LDR^[14]	AST	53.1%	—
DPL^[15]	AST	54.2%	47.0%
TPLD^[17]	AS	55.7%	48.1%
ISR^[12]	ST	57.1%	49.0%
ProCA^[30]	S	59.6%	53.0%
UCDA^[31]	ST	63.1%	56.5%
$U D A M A N - M F$	AS	$66 . 9 %$	$58 . 8 %$

方法	训练方法	PA	MA	mIoU
UIA^[18]	AS	71.4%	57.8%	43.6%
BDL^[13]	AST	74.2%	61.0%	46.3%
DPL^[15]	AST	77.5%	63.2%	49.8%
SAC^[29]	S	79.6%	66.1%	52.7%
ProCA^[30]	S	81.3%	69.7%	55.2%
$U D A M A N - M F$	AS	$84 . 9 %$	$74 . 6 %$	$59 . 7 %$

References 32

[1]	徐英姿, 刘原, 时梦然 ,等. 语义在通信中的应用综述[J]. 电信科学, 2022,38(Z1): 43-59.
	XU Y Z , LIU Y , SHI M R ,et al. A survey of semantic applications in communications[J]. Telecommunications Science, 2022,38(Z1): 43-59.
[2]	AGIA C , JATAVALLABHULA K M , KHODEIR M ,et al. Taskography:evaluating robot task planning over large 3D scene graphs[C]// Proceedings of Conference on Robot Learning. Cambridge:JMLR, 2022: 46-58.
[3]	CAESAR H , BANKITI V , LANG A H ,et al. nuScenes:a multimodal dataset for autonomous driving[C]// Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. Piscataway:IEEE Press, 2020: 11621-11631.
[4]	YU C , LIU Z X , LIU X J ,et al. DS-SLAM:a semantic visual SLAM towards dynamic environments[C]// Proceedings of 2018 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS). Piscataway:IEEE Press, 2018: 1168-1174.
[5]	HE K M , ZHANG X Y , REN S Q ,et al. Deep residual learning for image recognition[C]// Proceedings of IEEE Conference on Computer Vision and Pattern Recognition. Piscataway:IEEE Press, 2016: 770-778.
[6]	LI Z , GAN Y , LIANG X ,et al. LSTM-CF:unifying context modeling and fusion with LSTMS for RGB-D scene labeling[C]// Proceedings of European Conference on Computer Vision. Berlin:Springer, 2016: 541-557.
[7]	YUAN Y H , CHEN X L , WANG J D . Object-contextual representations for semantic segmentation[C]// Proceedings of European Conference on Computer Vision. Berlin:Springer, 2020: 173-190.
[8]	RADFORD A , METZ L , CHINTALA S . Unsupervised representation learning with deep convolutional generative adversarial networks[J]. arXiv Preprint,arXiv:1511.06434, 2015.
[9]	HOFFMAN J , WANG D , YU F ,et al. FCNs in the wild:Pixel-level adversarial and constraint-based adaptation[J]. arXiv Preprint,arXiv:1612.02649, 2016.
[10]	TSAI Y H , HUNG W C , SCHULTER S ,et al. Learning to adapt structured output space for semantic segmentation[C]// Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. Piscataway:IEEE Press, 2018: 7472-7481.
[11]	ZHU J Y , PARK T , ISOLA P ,et al. Unpaired image-to-image translation using cycle-consistent adversarial networks[C]// Proceedings of the IEEE International Conference on Computer Vision. Piscataway:IEEE Press, 2017: 2223-2232.
[12]	LI Z Y , TOGO R , OGAWA T ,et al. Learning intra-domain style-invariant representation for unsupervised domain adaptation of semantic segmentation[J]. Pattern Recognition, 2022,132(12): 108911.
[13]	LI Y S , YUAN L , VASCONCELOS N . Bidirectional learning for domain adaptation of semantic segmentation[C]// Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. Piscataway:IEEE Press, 2019: 6936-6945.
[14]	YANG J , AN W , WANG S ,et al. Label-driven reconstruction for domain adaptation in semantic segmentation[C]// Proceedings of European Conference on Computer Vision. Berlin:Springer, 2020: 480-498.
[15]	CHENG Y , WEI F , BAO J ,et al. Dual path learning for domain adaptation of semantic segmentation[C]// Proceedings of the IEEE/CVF International Conference on Computer Vision. Piscataway:IEEE Press, 2021: 9082-9091.
[16]	LEE S , HYUN J , SEONG H ,et al. Unsupervised domain adaptation for semantic segmentation by content transfer[C]// Proceedings of the AAAI Conference on Artificial Intelligence. Palo Alto:AAAI Press, 2021: 8306-8315.
[17]	SHIN I , WOO S , PAN F ,et al. Two-phase pseudo label densification for self-training based domain adaptation[C]// Proceedings of European Conference on Computer Vision. Berlin:Springer, 2020: 532-548.
[18]	PAN F , SHIN I , RAMEAU F ,et al. Unsupervised intra-domain adaptation for semantic segmentation through self-supervision[C]// Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. Piscataway:IEEE Press, 2020: 3764-3773.
[19]	PENG C L , MA J Y . Domain adaptive semantic segmentation via entropy-ranking and uncertain learning-based self-training[J]. IEEE/CAA Journal of Automatica Sinica, 2022,9(8): 1524-1527.
[20]	YANG J , AN W , YAN C ,et al. Context-aware domain adaptation in semantic segmentation[C]// Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision. Piscataway:IEEE Press, 2021: 514-524.
[21]	HUANG J , LU S , GUAN D ,et al. Contextual-relation consistent domain adaptation for semantic segmentation[C]// Proceedings of European Conference on Computer Vision. Berlin:Springer, 2020: 705-722.
[22]	RICHTER S R , VINEET V , ROTH S ,et al. Playing for data:ground truth from computer games[C]// Proceedings of European Conference on Computer Vision. Berlin:Springer, 2016: 102-118.
[23]	ROS G , SELLART L , MATERZYNSKA J ,et al. The SYNTHIA dataset:a large collection of synthetic images for semantic segmentation of urban scenes[C]// Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. Piscataway:IEEE Press, 2016: 3234-3243.
[24]	CORDTS M , OMRAN M , RAMOS S ,et al. The cityscapes dataset for semantic urban scene understanding[C]// Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. Piscataway:IEEE Press, 2016: 3213-3223.
[25]	SONG S R , LICHTENBERG S P , XIAO J X . SUN RGB-D:a RGB-D scene understanding benchmark suite[C]// Proceedings of IEEE Conference on Computer Vision and Pattern Recognition. Piscataway:IEEE Press, 2015: 567-576.
[26]	SILBERMAN N , HOIEM D , KOHLI P ,et al. Indoor segmentation and support inference from RGBD images[C]// Proceedings of European Conference on Computer Vision. Berlin:Springer, 2012: 746-760.
[27]	PASZKE A , GROSS S , MASSA F ,et al. PyTorch:an imperative style,high-performance deep learning library[J]. Advances in Neural Information Processing Systems, 2019,32(12): 8024-8035.
[28]	GUO X Q , YANG C , LI B P ,et al. MetaCorrection:domain-aware meta loss correction for unsupervised domain adaptation in semantic segmentation[C]// Proceedings of IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). Piscataway:IEEE Press, 2021: 3926-3935.
[29]	ARASLANOV N , ROTH S . Self-supervised augmentation consistency for adapting semantic segmentation[C]// Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. Piscataway:IEEE Press, 2021: 15384-15394.
[30]	JIANG Z K , LI Y X , YANG C Y ,et al. Prototypical contrast adaptation for domain adaptive semantic segmentation[C]// Proceedings of European Conference on Computer Vision. Berlin:Springer, 2022: 36-54.
[31]	ZHANG F , KOLTUN V , TORR P ,et al. Unsupervised contrastive domain adaptation for semantic segmentation[J]. arXiv Preprint,arXiv:2204.08399, 2022.
[32]	YANG J Y , LI C Y , AN W Z ,et al. Exploring robustness of unsupervised domain adaptation in semantic segmentation[C]// Proceedings of IEEE/CVF International Conference on Computer Vision (ICCV). Piscataway:IEEE Press, 2021: 9174-9183.

Metrics

Recommended 0

No Suggested Reading articles found!

方法	训练方法						mIoU
方法	训练方法	马路	人行道	建筑	墙	围栏	杆	信号灯	交通标识	植物	地面
BDL^[13]	AST	91.0%	44.7%	84.2%	34.6%	27.6%	30.2%	36.0%	36.0%	85.0%	43.6%
CDA^[20]	A	91.3%	46.0%	84.5%	34.4%	29.7%	32.6%	35.8%	36.4%	84.5%	43.2%
TPLD^[17]	AS	94.2%	60.5%	82.8%	36.6%	16.6%	39.3%	29.0%	25.5%	85.6%	44.9%
MetaCorrection^[28]	S	92.8%	58.1%	86.2%	39.7%	33.1%	36.3%	42.0%	38.6%	85.5%	37.8%
ISR^[12]	ST	93.0%	54.0%	86.6%	42.6%	34.7%	35.9%	40.8%	43.3%	86.0%	43.2%
DPL^[15]	AST	92.8%	54.4%	86.2%	41.6%	32.7%	36.4%	49.0%	34.0%	85.8%	41.3%
SAC^[29]	S	90.4%	53.9%	86.6%	42.4%	27.3%	45.1%	48.5%	42.7%	87.4%	40.1%
ProCA^[30]	S	91.9%	48.4%	87.3%	41.5%	31.8%	41.9%	47.9%	36.7%	86.5%	42.3%
UCDA^[31]	ST	92.6%	59.1%	88.5%	45.8%	40.5%	52.9%	53.6%	54.1%	88.0%	41.9%
UDAMAN-MF	AS	93.8%	61.7%	87.1%	48.9%	44.0%	54.4%	55.2%	56.5%	87.1%	43.3%

方法	训练方法						mIoU
方法	训练方法	天空	行人	骑手	汽车	卡车	公交车	火车	摩托车	自行车	平均
BDL^[13]	AST	83.0%	58.6%	31.6%	83.3%	35.3%	49.7%	3.3%	28.8%	35.6%	48.5%
CDA^[20]	A	83.0%	60.0%	32.2%	83.2%	35.0%	46.7%	0.0%	33.7%	42.2%	49.2%
TPLD^[17]	AS	84.4%	60.6%	27.4%	84.1%	37.0%	47.0%	31.2%	36.1%	50.3%	51.2%
MetaCorrection^[28]	S	87.6%	62.8%	31.7%	84.8%	35.7%	50.3%	2.0%	36.8%	48.0%	52.1%
ISR^[12]	ST	85.4%	61.5%	34.4%	83.7%	29.2%	50.1%	4.0%	36.5%	50.9%	52.4%
DPL^[15]	AST	86.0%	63.2%	34.2%	87.2%	39.3%	44.5%	18.7%	42.6%	43.1%	53.3%
SAC^[29]	S	86.1%	67.5%	29.7%	88.5%	49.1%	54.6%	9.8%	26.6%	45.3%	53.8%
ProCA^[30]	S	84.7%	68.4%	43.1%	88.1%	39.6%	48.8%	40.6%	43.6%	56.9%	56.3%
UCDA^[31]	ST	86.0%	73.5%	44.1%	89.7%	39.3%	53.2%	26.8%	51.6%	61.8%	60.2%
UDAMAN-MF	AS	88.7%	76.2%	46.3%	88.1%	45.8%	53.9%	33.2%	54.1%	63.4%	62.2%

Unsupervised domain adaptation multi-level adversarial network for semantic segmentation based on multi-modal features

RichHTML

PDF

Knowledge

Abstract

Cite this article

share this article

Figures/Tables 13

References 32

Related Articles 4

Metrics

Recommended 0

[1]	Wenlin ZHANG, Xuepeng LIU, Tong NIU, Qi CHEN, Dan QU. Self-supervised speech representation learning based on positive sample comparison and masking reconstruction [J]. Journal on Communications, 2022, 43(7): 163-171.
[2]	Fan GUO, Yongxiang ZHANG, Jin TANG, Weiqing LI. YOLOv3-A: a traffic sign detection network based on attention mechanism [J]. Journal on Communications, 2021, 42(1): 87-99.
[3]	Jun YANG,Jisheng DANG. Semantic segmentation of 3D point cloud based on contextual attention CNN [J]. Journal on Communications, 2020, 41(7): 195-203.
[4]	Linhui LI,Bo QIAN,Jing LIAN,Weina ZHENG,Yafu ZHOU. Study on traffic scene semantic segmentation method based on convolutional neural network [J]. Journal on Communications, 2018, 39(4): 123-130.