通信学报 ›› 2022, Vol. 43 ›› Issue (12): 157-171.doi: 10.11959/j.issn.1000-436x.2022212

• 学术论文 • 上一篇    下一篇

基于多模态特征的无监督领域自适应多级对抗语义分割网络

王泽宇1, 布树辉2, 黄伟1, 郑远攀1, 吴庆岗1, 常化文1, 张旭1   

  1. 1 郑州轻工业大学计算机与通信工程学院,河南 郑州 450000
    2 西北工业大学航空学院,陕西 西安 710072
  • 修回日期:2022-09-20 出版日期:2022-12-25 发布日期:2022-12-01
  • 作者简介:王泽宇(1989- ),男,河南郑州人,博士,郑州轻工业大学讲师,主要研究方向为计算机视觉、图像处理、深度学习等
    布树辉(1978- ),男,河南洛阳人,博士,西北工业大学教授、博士生导师,主要研究方向为计算机视觉、图像处理、机器学习等
    黄伟(1982- ),男,河南郑州人,博士,郑州轻工业大学副教授、硕士生导师,主要研究方向为遥感图像处理、深度学习等
    郑远攀(1983- ),男,河南郑州人,博士,郑州轻工业大学副教授、硕士生导师,主要研究方向为图像处理、智慧应急等
    吴庆岗(1984- ),男,河南濮阳人,博士,郑州轻工业大学副教授、硕士生导师,主要研究方向为计算机视觉、遥感图像处理、深度学习等
    常化文(1980- ),男,河南郑州人,博士,郑州轻工业大学讲师、硕士生导师,主要研究方向为图像质量评价、计算机视觉等
    张旭(1979– ),女,河南南阳人,郑州轻工业大学讲师,主要研究方向为图像处理、模型检测等
  • 基金资助:
    河南省科技攻关基金资助项目(222102210021);河南省高等学校重点科研项目计划基金资助项目(21A520049);河南省高等学校重点科研项目计划基金资助项目(23A520004)

Unsupervised domain adaptation multi-level adversarial network for semantic segmentation based on multi-modal features

Zeyu WANG1, Shuhui BU2, Wei HUANG1, Yuanpan ZHENG1, Qinggang WU1, Huawen CHANG1, Xu ZHANG1   

  1. 1 College of Computer and Communication Engineering, Zhengzhou University of Light Industry, Zhengzhou 450000, China
    2 School of Aeronautics, Northwestern Polytechnical University, Xi’an 710072, China
  • Revised:2022-09-20 Online:2022-12-25 Published:2022-12-01
  • Supported by:
    The Science and Technology Project of Henan Province(222102210021);The Plan Support for Key Scientific Research Project of Higher Education in Henan Province(21A520049);The Plan Support for Key Scientific Research Project of Higher Education in Henan Province(23A520004)

摘要:

为了解决领域自适应中存在领域间视觉、空间以及语义特征分布差异的问题,提出了基于多模态特征的无监督领域自适应多级对抗语义分割网络。首先,设计3层结构的注意力融合语义分割网络来分别从源域和目标域学习上述三类特征。然后,在单级对抗学习中引入联合分布置信度和语义置信度的自监督学习方法,从而在领域间所学特征的分布距离最小化过程中实现更多目标域像素的分布对齐。最后,通过基于多模态特征的多级对抗学习方法对3路对抗分支与3个自适应子网进行联合优化,从而能够有效学习各子网所提取特征的域间不变表示。实验结果表明,与当前先进方法相比,所提网络在GTA5到Cityscapes、SYNTHIA到Cityscapes和SUN-RGBD到NYUD-v2的数据集上分别取得最优的平均交并比62.2%、66.9%和59.7%。

关键词: 无监督领域自适应, 语义分割, 多模态特征, 注意力融合, 多级对抗学习, 自监督学习

Abstract:

In order to solve the problem of the distribution differences of visual, spatial, and semantic features between domains in domain adaptation, an unsupervised domain adaptation multi-level adversarial network for semantic segmentation based on multi-modal features was proposed.Firstly, an attentive fusion semantic segmentation network with three-layer structure was designed to learn the above three types of features from the source domain and target domain, respectively.Secondly, a self-supervised learning method jointing distribution confidence and semantic confidence was introduced into the single-level adversarial learning, so as to achieve the distribution alignment of more target domain pixels in the process of minimizing the distribution distance of the learnt features between domains.Finally, three adversarial branches and three adaptive sub-networks were jointly optimized by the multi-level adversarial learning method based on multi-modal features, which could effectively learn the invariant representation between domains for the features extracted from each sub-network.The experimental results show that compared with existing state-of-the-art methods, on the datasets of GTA5 to Cityscapes, SYNTHIA to Cityscapes, and SUN-RGBD to NYUD-v2 the proposed network achieves the best mean intersection over union of 62.2%, 66.9%, and 59.7%, respectively.

Key words: unsupervised domain adaptation, semantic segmentation, multi-modal features, attentive fusion, multi-level adversarial learning, self-supervised learning

中图分类号: 

No Suggested Reading articles found!