Journal on Communications ›› 2022, Vol. 43 ›› Issue (12): 157-171.doi: 10.11959/j.issn.1000-436x.2022212

• Papers • Previous Articles     Next Articles

Unsupervised domain adaptation multi-level adversarial network for semantic segmentation based on multi-modal features

Zeyu WANG1, Shuhui BU2, Wei HUANG1, Yuanpan ZHENG1, Qinggang WU1, Huawen CHANG1, Xu ZHANG1   

  1. 1 College of Computer and Communication Engineering, Zhengzhou University of Light Industry, Zhengzhou 450000, China
    2 School of Aeronautics, Northwestern Polytechnical University, Xi’an 710072, China
  • Revised:2022-09-20 Online:2022-12-25 Published:2022-12-01
  • Supported by:
    The Science and Technology Project of Henan Province(222102210021);The Plan Support for Key Scientific Research Project of Higher Education in Henan Province(21A520049);The Plan Support for Key Scientific Research Project of Higher Education in Henan Province(23A520004)

Abstract:

In order to solve the problem of the distribution differences of visual, spatial, and semantic features between domains in domain adaptation, an unsupervised domain adaptation multi-level adversarial network for semantic segmentation based on multi-modal features was proposed.Firstly, an attentive fusion semantic segmentation network with three-layer structure was designed to learn the above three types of features from the source domain and target domain, respectively.Secondly, a self-supervised learning method jointing distribution confidence and semantic confidence was introduced into the single-level adversarial learning, so as to achieve the distribution alignment of more target domain pixels in the process of minimizing the distribution distance of the learnt features between domains.Finally, three adversarial branches and three adaptive sub-networks were jointly optimized by the multi-level adversarial learning method based on multi-modal features, which could effectively learn the invariant representation between domains for the features extracted from each sub-network.The experimental results show that compared with existing state-of-the-art methods, on the datasets of GTA5 to Cityscapes, SYNTHIA to Cityscapes, and SUN-RGBD to NYUD-v2 the proposed network achieves the best mean intersection over union of 62.2%, 66.9%, and 59.7%, respectively.

Key words: unsupervised domain adaptation, semantic segmentation, multi-modal features, attentive fusion, multi-level adversarial learning, self-supervised learning

CLC Number: 

No Suggested Reading articles found!