通信学报 ›› 2023, Vol. 44 ›› Issue (5): 28-41.doi: 10.11959/j.issn.1000-436x.2023105
秦志金1, 赵菼菼2, 李凡2, 陶晓明1
修回日期:
2023-05-06
出版日期:
2023-05-25
发布日期:
2023-05-01
作者简介:
秦志金(1989- ),女,山西太原人,博士,清华大学副教授、博士生导师,主要研究方向为语义通信等基金资助:
Zhijin QIN1, Tantan ZHAO2, Fan LI2, Xiaoming TAO1
Revised:
2023-05-06
Online:
2023-05-25
Published:
2023-05-01
Supported by:
摘要:
随着人工智能与通信的交叉融合,文本、图像、音频、视频等多模态数据处理技术蓬勃发展,模态语义的共享维度被深度挖掘,多模态语义信息的高度抽象、智能简约等特性被充分利用,为语义通信带来了全新的思路和手段。首先,介绍了语义通信的基础理论和分类,分别针对文本、图像、音频、视频综述了单模态语义通信的研究现状;然后,综述了多模态语义通信的研究现状,介绍了多模态数据融合技术和安全语义通信的研究;最后,总结了多模态语义通信面临的挑战。
中图分类号:
秦志金, 赵菼菼, 李凡, 陶晓明. 多模态语义通信研究综述[J]. 通信学报, 2023, 44(5): 28-41.
Zhijin QIN, Tantan ZHAO, Fan LI, Xiaoming TAO. Survey of research on multimodal semantic communication[J]. Journal on Communications, 2023, 44(5): 28-41.
[6] | SHANNON C E , WEAVER W . The mathematical theory of communication[M]. Urbana: University of Illinois Press, 1998. |
[7] | ZHANG P , XU W , GAO H ,et al. Toward wisdom-evolutionary and primitive-concise 6G:a new paradigm of semantic communication networks[J]. Engineering, 2022,8: 60-73. |
[8] | CARNAP R , BAR-HILLEL Y . An outline of a theory of semantic information[J]. The Journal of Symbolic Logic, 1954,19(3): 230-232. |
[9] | BAO J , BASU P , DEAN M K ,et al. Towards a theory of semantic communication[C]// Proceedings of 2011 IEEE Network Science Workshop. Piscataway:IEEE Press, 2011: 110-117. |
[10] | 刘传宏, 郭彩丽, 杨洋 ,等. 面向智能任务的语义通信:理论、技术和挑战[J]. 通信学报, 2022,43(6): 41-57. |
LIU C H , GUO C L , YANG Y ,et al. Intelligent task-oriented semantic communications:theory,technology and challenges[J]. Journal on Communications, 2022,43(6): 41-57. | |
[11] | SHAO J W , MAO Y Y , ZHANG J . Learning task-oriented communication for edge inference:an information bottleneck approach[J]. IEEE Journal on Selected Areas in Communications, 2022,40(1): 197-211. |
[12] | 张海君, 陈安琪, 李亚博 ,等. 6G移动网络关键技术[J]. 通信学报, 2022,43(7): 189-202. |
ZHANG H J , CHEN A Q , LI Y B ,et al. Key technologies of 6G mobile network[J]. Journal on Communications, 2022,43(7): 189-202. | |
[13] | CALVANESE S E , BARBAROSSA S . 6G networks:beyond Shannon towards semantic and goal-oriented communications[J]. Computer Networks, 2021,190:107930. |
[14] | SHI G , GAO D , SONG X ,et al. A new communication paradigm:from bit accuracy to semantic fidelity[J]. arXiv Preprint,arXiv:2101.12649, 2021. |
[15] | TONG H N , YANG Z H , WANG S H ,et al. Federated learning for audio semantic communication[J]. Frontiers in Communications and Networks, 2021,2:734402. |
[16] | WENG Z Z , QIN Z J , LI G Y . Semantic communications for speech signals[C]// Proceedings of 2021 IEEE International Conference on Communications. Piscataway:IEEE Press, 2021: 1-6. |
[17] | JIANG P , WEN C K , JIN S ,et al. Wireless semantic communications for video conferencing[J]. arXiv Preprint,arXiv:2204.07790, 2022. |
[18] | FARSAD N , RAO M , GOLDSMITH A . Deep learning for joint source-channel coding of text[C]// Proceedings of 2018 IEEE International Conference on Acoustics,Speech and Signal Processing (ICASSP). Piscataway:IEEE Press, 2018: 2326-2330. |
[19] | PENNINGTON J , SOCHER R , MANNING C . Glove:global vectors for word representation[C]// Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing (EMNLP). Stroudsburg:Association for Computational Linguistics, 2014: 1532-1543. |
[20] | BAHDANAU D , CHO K , BENGIO Y . Neural machine translation by jointly learning to align and translate[J]. arXiv Preprint,arXiv:1409.0473, 2014. |
[21] | WU Y , SCHUSTER M , CHEN Z ,et al. Google’s neural machine translation system:bridging the gap between human and machine translation[J]. arXiv Preprint,arXiv:1609.08144, 2016. |
[22] | GRAVES A . Sequence transduction with recurrent neural networks[J]. arXiv Preprint,arXiv:1211.3711, 2012. |
[23] | MIKOLOV T , CHEN K , CORRADO G ,et al. Efficient estimation of word representations in vector space[J]. arXiv Preprint,arXiv:1301.3781, 2013. |
[24] | XIE H Q , QIN Z J , LI G Y ,et al. Deep learning enabled semantic communication systems[J]. IEEE Transactions on Signal Processing, 2021,69: 2663-2675. |
[25] | SANA M , STRINATI E C . Learning semantics:an opportunity for effective 6G communications[C]// Proceedings of 2022 IEEE 19th Annual Consumer Communications & Networking Conference (CCNC). Piscataway:IEEE Press, 2022: 631-636. |
[1] | QIN Z , TAO X , LU J ,et al. Semantic communications:principles and challenges[J]. arXiv Preprint,arXiv:2201.01389, 2022. |
[2] | 刘传宏, 郭彩丽, 杨洋 ,等. 人工智能物联网中面向智能任务的语义通信方法[J]. 通信学报, 2021,42(11): 97-108. |
[26] | ZHOU Q Y , LI R P , ZHAO Z F ,et al. Semantic communication with adaptive universal transformer[J]. IEEE Wireless Communications Letters, 2022,11(3): 453-457. |
[27] | DEHGHANI M , GOUWS S , VINYALS O ,et al. Universal transformers[J]. arXiv Preprint,arXiv:1807.03819, 2018. |
[2] | LIU C H , GUO C L , YANG Y ,et al. Intelligent task-oriented semantic communication method in artificial intelligence of things[J]. Journal on Communications, 2021,42(11): 97-108. |
[3] | LI A , WEI X , WU D ,et al. Cross-modal semantic communications[J]. IEEE Wireless Communications, 2022,29(6): 144-151. |
[28] | GRAVES A . Adaptive computation time for recurrent neural networks[J]. arXiv Preprint,arXiv:1603.08983, 2016. |
[29] | LEE C H , LIN J W , CHEN P H ,et al. Deep learning-constructed joint transmission-recognition for Internet of things[J]. IEEE Access, 2019,7: 76547-76561. |
[4] | ZHONG Y X . A theory of semantic information[J]. China Communications, 2017,14(1): 1-17. |
[5] | MORRIS C W . Foundations of the theory of signs[M]. Chicago: University of Chicago Press, 1938. |
[30] | HE K M , ZHANG X Y , REN S Q ,et al. Deep residual learning for image recognition[C]// Proceedings of 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR). Piscataway:IEEE Press, 2016: 770-778. |
[31] | XU J L , AI B , CHEN W ,et al. Wireless image transmission using deep source channel coding with attention modules[J]. IEEE Transactions on Circuits and Systems for Video Technology, 2022,32(4): 2315-2328. |
[32] | HU Q , ZHANG G , QIN Z ,et al. Robust semantic communications against semantic noise[J]. arXiv Preprint,arXiv:2202.03338, 2022. |
[33] | HE K M , CHEN X L , XIE S N ,et al. Masked autoencoders are scalable vision learners[C]// Proceedings of 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). Piscataway:IEEE Press, 2022: 15979-15988. |
[34] | SCHNEIDER S , BAEVSKI A , COLLOBERT R ,et al. Wav2Vec:unsupervised pre-training for speech recognition[J]. arXiv Preprint,arXiv:1904.05862, 2019. |
[35] | WENG Z Z , QIN Z J . Semantic communication systems for speech transmission[J]. IEEE Journal on Selected Areas in Communications, 2021,39(8): 2434-2444. |
[36] | WENG Z Z , QIN Z J , LI G Y . Semantic communications for speech recognition[J]. arXiv Preprint,arXiv:2107.11190, 2021. |
[37] | SCHUSTER M , PALIWAL K K . Bidirectional recurrent neural networks[J]. IEEE Transactions on Signal Processing, 1997,45(11): 2673-2681. |
[38] | TUNG T Y , GüNDüZ D . DeepWiVe:deep-learning-aided wireless video transmission[J]. IEEE Journal on Selected Areas in Communications, 2022,40(9): 2570-2583. |
[39] | WANG S , DAI J , LIANG Z ,et al. Wireless deep video semantic transmission[J]. arXiv Preprint,arXiv:2205.13129, 2022. |
[40] | TAO X M , DUAN Y P , XU M ,et al. Learning QoE of mobile video transmission with deep neural network:a data-driven approach[J]. IEEE Journal on Selected Areas in Communications, 2019,37(6): 1337-1348. |
[41] | FRIED O , TEWARI A , ZOLLH?FER M , ,et al. Text-based editing of talking-head video[J]. ACM Transactions on Graphics, 2019,38(4): 1-14. |
[42] | TANDON P , CHANDAK S , PATARANUTAPORN P ,et al. Txt2Vid:ultra-low bitrate compression of talking-head videos via text[J]. arXiv Preprint,arXiv:2106.14014, 2021. |
[43] | 赵亮 . 多模态数据融合算法研究[D]. 大连:大连理工大学, 2018. |
ZHAO L . Research on multimodal data fusion algorithm[D]. Dalian:Dalian University of Technology, 2018. | |
[44] | 任泽裕, 王振超, 柯尊旺 ,等. 多模态数据融合综述[J]. 计算机工程与应用, 2021,57(18): 49-64. |
REN Z Y , WANG Z C , KE Z W ,et al. Survey of multimodal data fusion[J]. Computer Engineering and Applications, 2021,57(18): 49-64. | |
[45] | LAHAT D , ADALI T , JUTTEN C . Multimodal data fusion:an overview of methods,challenges,and prospects[J]. Proceedings of the IEEE, 2015,103(9): 1449-1477. |
[46] | PEREZ-RUA J M , VIELZEUF V , PATEUX S ,et al. MFAS:multimodal fusion architecture search[C]// Proceedings of 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). Piscataway:IEEE Press, 2020: 6959-6968. |
[47] | VIELZEUF V , LECHERVY A , PATEUX S ,et al. CentralNet:a multilayer approach for multimodal fusion[J]. arXiv Preprint,arXiv:1808.07275, 2018. |
[48] | SNOEK C G M , WORRING M , SMEULDERS A W M . Early versus late fusion in semantic video analysis[C]// Proceedings of the 13th Annual ACM International Conference on Multimedia. New York:ACM Press, 2005: 399-402. |
[49] | NATARAJAN P , WU S , VITALADEVUNI S ,et al. Multimodal feature fusion for robust event detection in web videos[C]// Proceedings of 2012 IEEE Conference on Computer Vision and Pattern Recognition. Piscataway:IEEE Press, 2012: 1298-1305. |
[50] | BEN-YOUNES H , CADENE R , CORD M ,et al. MUTAN:multimodal tucker fusion for visual question answering[C]// Proceedings of 2017 IEEE International Conference on Computer Vision (ICCV). Piscataway:IEEE Press, 2017: 2631-2639. |
[51] | YE G N , LIU D , JHUO I H ,et al. Robust late fusion with rank minimization[C]// Proceedings of 2012 IEEE Conference on Computer Vision and Pattern Recognition. Piscataway:IEEE Press, 2012: 3021-3028. |
[52] | MNIH V , HEESS N , GRAVES A ,et al. Recurrent models of visual attention[J]. arXiv Preprint,arXiv:1406.6247, 2014. |
[53] | WANG F , JIANG M Q , QIAN C ,et al. Residual attention network for image classification[C]// Proceedings of 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR). Piscataway:IEEE Press, 2017: 6450-6458. |
[54] | VASWANI A , SHAZEER N , PARMAR N ,et al. Attention is all you need[C]// Proceedings of the 31st International Conference on Neural Information Processing Systems. New York:ACM Press, 2017: 6000-6010. |
[55] | KIM J H , ON K W , LIM W ,et al. Hadamard product for low-rank bilinear pooling[J]. arXiv Preprint,arXiv:1610.04325, 2016. |
[56] | YANG Z C , HE X D , GAO J F ,et al. Stacked attention networks for image question answering[C]// Proceedings of 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR). Piscataway:IEEE Press, 2016: 21-29. |
[57] | ANDERSON P , HE X D , BUEHLER C ,et al. Bottom-up and top-down attention for image captioning and visual question answering[C]// Proceedings of 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Piscataway:IEEE Press, 2018: 6077-6086. |
[58] | LU J S , YANG J W , BATRA D ,et al. Hierarchical question-image co-attention for visual question answering[C]// Proceedings of the 30th International Conference on Neural Information Processing Systems. New York:ACM Press, 2016: 289-297. |
[59] | YU Z , YU J , CUI Y H ,et al. Deep modular Co-attention networks for visual question answering[C]// Proceedings of 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). Piscataway:IEEE Press, 2020: 6274-6283. |
[60] | NAM H , HA J W , KIM J . Dual attention networks for multimodal reasoning and matching[C]// Proceedings of 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR). Piscataway:IEEE Press, 2017: 2156-2164. |
[61] | XIE H , QIN Z , LI G Y . Task-oriented semantic communications for multimodal data[J]. arXiv Preprint,arXiv:2108.07357, 2021. |
[62] | RUSSAKOVSKY O , DENG J , SU H ,et al. ImageNet large scale visual recognition challenge[J]. International Journal of Computer Vision, 2015,115(3): 211-252. |
[63] | HUDSON D A , MANNING C D . Compositional attention networks for machine reasoning[J]. arXiv Preprint,arXiv:1803.03067, 2018. |
[64] | XIE H Q , QIN Z J , TAO X M ,et al. Task-oriented multi-user semantic communications[J]. IEEE Journal on Selected Areas in Communications, 2022,40(9): 2584-2597. |
[65] | ZHANG G , HU Q , QIN Z ,et al. A unified multi-task semantic communication system with domain adaptation[J]. arXiv Preprint,arXiv:2206.00254, 2022. |
[66] | LUO X W , GAO R B , CHEN H H ,et al. Multi-modal and multi-user semantic communications for channel-level information fusion[J]. IEEE Wireless Communications, 2022:doi.org/10.1109/MWC.011.2200288. |
[67] | YANG W , LIEW Z Q , LIM W Y B ,et al. Semantic communication meets edge intelligence[J]. arXiv Preprint,arXiv:2202.06471, 2022. |
[68] | KIM B , SAGDUYU Y E , DAVASLIOGLU K ,et al. Channel-aware adversarial attacks against deep learning-based wireless signal classifiers[J]. IEEE Transactions on Wireless Communications, 2022,21(6): 3868-3880. |
[69] | ZHENG Z R , LI Z T , JIANG H B ,et al. Semantic-aware privacy-preserving online location trajectory data sharing[J]. IEEE Transactions on Information Forensics and Security, 2022,17: 2256-2271. |
[70] | BAJI?I V , LIN W S , TIAN Y H . Collaborative intelligence:challenges and opportunities[C]// Proceedings of 2021 IEEE International Conference on Acoustics,Speech and Signal Processing (ICASSP). Piscataway:IEEE Press, 2021: 8493-8497. |
[71] | MIRESHGHALLAH F , TARAM M , RAMRAKHYANI P ,et al. Shredder:learning noise distributions to protect inference privacy[C]// Proceedings of the Twenty-Fifth International Conference on Architectural Support for Programming Languages and Operating Systems. New York:ACM Press, 2020: 3-18. |
[72] | GOODFELLOW I , POUGET-ABADIE J , MIRZA M ,et al. Generative adversarial networks[J]. Communications of the ACM, 2020,63(11): 139-144. |
[73] | TUNG T Y , GUNDUZ D . Deep joint source-channel and encryption coding:secure semantic communications[J]. arXiv Preprint,arXiv:2208.09245, 2022. |
[74] | LUO X , CHEN Z , TAO M ,et al. Encrypted semantic communication using adversarial training for privacy preserving[J]. arXiv Preprint,arXiv:2209.09008, 2022. |
[75] | LU K , ZHOU Q Y , LI R P ,et al. Rethinking modern communication from semantic coding to semantic communication[J]. IEEE Wireless Communications, 2023,30(1): 158-164. |
[76] | SEO H , PARK J , BENNIS M ,et al. Semantics-native communication with contextual reasoning[J]. arXiv Preprint,arXiv:2108.05681, 2021. |
[77] | ZHAO T T , LI G B , ZHANG G M ,et al. Security-enhanced user pairing for MISO-NOMA downlink transmission[C]// Proceedings of 2018 IEEE Global Communications Conference (GLOBECOM). Piscataway:IEEE Press, 2019: 1-6. |
[78] | ZHAO T T , HE L J , HUANG X Y ,et al. QoE-driven secure video transmission in cloud-edge collaborative networks[J]. IEEE Transactions on Vehicular Technology, 2022,71(1): 681-696. |
[79] | ZHAO T T , HE L J , HUANG X Y ,et al. DRL-based secure video offloading in MEC-enabled IoT networks[J]. IEEE Internet of Things Journal, 2022,9(19): 18710-18724. |
[80] | ZHAO T T , LI F , HE L J . DRL-based joint resource allocation and device orchestration for hierarchical federated learning in NOMA-enabled industrial IoT[J]. IEEE Transactions on Industrial Informatics, 2022:doi.org/10.1109/TII.2022.3170900. |
[81] | LIU Y Q , XU K D , LI J X ,et al. Millimeter-wave E-plane waveguide bandpass filters based on spoof surface plasmon polaritons[J]. IEEE Transactions on Microwave Theory and Techniques, 2022,70(10): 4399-4409. |
[82] | LIU Y Q , XU K D . Design of millimeter-wave bandpass filter using edge-coupling dual-mode resonator[C]// Proceedings of 2021 IEEE Asia-Pacific Microwave Conference (APMC). Piscataway:IEEE Press, 2022: 154-156. |
[1] | 李荣鹏, 汪丙炎, 张宏纲, 赵志峰. 知识增强的语义通信接收端设计[J]. 通信学报, 2023, 44(6): 70-76. |
[2] | 张平, 牛凯, 姚圣时, 戴金晟. 面向未来的语义通信:基本原理与实现方法[J]. 通信学报, 2023, 44(5): 1-14. |
[3] | 石光明, 杨旻曦, 高大化, 柴靖轩. 面向语义信息直传的通信架构[J]. 通信学报, 2023, 44(5): 15-27. |
[4] | 张平, 戴金晟, 张育铭, 王思贤, 秦晓琦, 牛凯. 面向语义通信的非线性变换编码[J]. 通信学报, 2023, 44(4): 1-14. |
[5] | 江沸菠, 彭于波, 董莉. 面向6G的深度图像语义通信模型[J]. 通信学报, 2023, 44(3): 198-208. |
[6] | 张海君, 陈安琪, 李亚博, 隆克平. 6G移动网络关键技术[J]. 通信学报, 2022, 43(7): 189-202. |
[7] | 刘传宏, 郭彩丽, 杨洋, 陈九九, 朱美逸, 孙鲁楠. 面向智能任务的语义通信:理论、技术和挑战[J]. 通信学报, 2022, 43(6): 41-57. |
[8] | 刘传宏, 郭彩丽, 杨洋, 冯春燕, 孙启政, 陈九九. 人工智能物联网中面向智能任务的语义通信方法[J]. 通信学报, 2021, 42(11): 97-108. |
阅读次数 | ||||||
全文 |
|
|||||
摘要 |
|
|||||
|