通信学报

所属专题: 6G

• •    

面向6G的跨模态信号重建技术

李昂1,2,陈建新1,2,魏昕1,2,周亮1,2   

  1. 1. 南京邮电大学通信与信息工程学院,江苏 南京 210003;2. 南京邮电大学宽带无线通信与传感网技术教育部重点实验室,江苏 南京 210003
  • 作者简介:李昂(1995− ),男,河南周口人,南京邮电大学博士生,主要研究方向为多媒体通信、人工智能。 陈建新(1973− ),男,江苏南通人,博士,南京邮电大学副教授、硕士生导师,主要研究方向为无线通信、人机交互。 魏昕(1983− ),男,江苏南京人,博士,南京邮电大学教授、硕士生导师,主要研究方向为多媒体通信。 周亮(1981− ),男,安徽芜湖人,博士,南京邮电大学教授、博士生导师,主要研究方向为多媒体通信。

6G-Oriented cross-modal signal reconstruction technology

LI Ang1,2, CHEN Jianxin1,2, WEI Xin1,2, ZHOU Liang1,2   

  1. 1. College of Telecommunications & Information Engineering, Nanjing University of Posts and Telecommunications, Nanjing 210003, China 2. Key Laboratory of Broadband Wireless Communication and Sensor Network Technology (Ministry of Education), Nanjing University of Posts and Tele-communications, Nanjing 210003, China

摘要: 6G时代下,为了兼顾多媒体用户音频、视频、触觉的沉浸式体验需求与低时延、高可靠、大容量的通信质量,提出一种跨模态信号重建架构及由视频信号重建触觉信号的深度学习模型。首先,通过控制机器人触摸各种材质,构建了包含音、视、触信号的数据集VisTouch,为后续各种跨模态问题的研究奠定基础;其次,通过利用多模态信号间的语义关联性,设计一种普适的、鲁棒的端到端跨模态信号重建架构;随后,以通过视频信号重建触觉信号为例,构建视频辅助的触觉重建模型,包括基于3D CNN的视频特征提取网络,基于全卷积网络的GAN生成网络与基于CNN的GAN辨别网络;最后,通过实验结果验证跨模态信号重建架构的可靠性以及触觉重建模型的准确性。

关键词: 6G, 跨模态信号重建, 多模态数据集, 3D卷积神经网络, 生成对抗网络

Abstract: In the 6G era, to balance the immersive experience needs of multimedia users for audio, video, and haptics with low-latency, high-reliability, and large-capacity communication, a cross-modal signal reconstruction framework and video-to-haptic reconstruction model was proposed. First, robots were controlled to touch various materials. In this way, a large-scale dataset VisTouch that includes audio, video, and haptic signals was constructed. This dataset can lay the foun-dation for subsequent researches on various cross-modal problems. In addition, based on the semantic relations of mul-ti-modal signals, a universe and robust end-to-end cross-modal signal reconstruction framework was designed. Further-more, take the reconstruction from video to haptic signals as an example. A video-assisted haptic reconstruction model was established, including a 3D CNN based video extraction sub-network, a fully convolutional network based GAN generation sub-network and a CNN based GAN discrimination sub-network. Finally, the reliability of the cross-modal signal reconstruction framework and the accuracy of the proposed video-to-haptic model were verified through experi-mental results.

No Suggested Reading articles found!