通信学报 ›› 2023, Vol. 44 ›› Issue (3): 105-116.doi: 10.11959/j.issn.1000-436x.2023043

• 学术论文 • 上一篇    下一篇

基于特征融合的通信语音干扰效果客观评估

林云1, 徐怀韬1, 王森1, 张思成1, 庄龙2   

  1. 1 哈尔滨工程大学信息与通信工程学院,黑龙江 哈尔滨 150001
    2 安徽大学集成电路学院,安徽 合肥 230039
  • 修回日期:2022-11-30 出版日期:2023-03-25 发布日期:2023-03-01
  • 作者简介:林云(1980− ),男,黑龙江哈尔滨人,哈尔滨工程大学教授、博士生导师,主要研究方向为智能无线电技术、人工智能和机器学习、大数据分析与挖掘、软件和认知无线电、信息安全与对抗、智能信息处理
    徐怀韬(1998− ),男,江西南昌人,哈尔滨工程大学硕士生,主要研究方向为通信干扰语音质量评估
    王森(1994− ),男,吉林四平人,哈尔滨工程大学博士生,主要研究方向为干扰评估、无线通信及信号处理
    张思成(1996− ),男,山东临沂人,哈尔滨工程大学博士生,主要研究方向为基于深度学习的智能电磁信号处理
    庄龙(1998− ),男,江苏徐州人,安徽大学硕士生,主要研究方向为雷达信号处理和计算机视觉
  • 基金资助:
    国家自然科学基金资助项目(62201172);中央高校基本科研业务费专项资金资助项目(3072022CF0804);中央高校基本科研业务费专项资金资助项目(3072022CF0601)

Objective assessment of communication speech interference effect based on feature fusion

Yun LIN1, Huaitao XU1, Sen WANG1, Sicheng ZHANG1, Long ZHUANG2   

  1. 1 College of Information and Communication Engineering, Harbin Engineering University, Harbin 150001, China
    2 School of Integrated Circuits, Anhui University, Hefei 230039, China
  • Revised:2022-11-30 Online:2023-03-25 Published:2023-03-01
  • Supported by:
    The National Natural Science Foundation of China(62201172);The Fundamental Research Funds for the Central Universities(3072022CF0804);The Fundamental Research Funds for the Central Universities(3072022CF0601)

摘要:

针对通信语音干扰效果客观评估问题,提出了基于多测度与多模态融合的2种评估方法。首先,利用端点检测算法以及动态时间弯折算法对受扰语音数据进行预处理。然后,提取数据中的语音内容并与标准语音进行测度计算得到5种测度,将5种测度融合后利用随机森林模型进行质量等级评估。最后,结合多模态融合技术,设计了基于残差结构的神经网络模型,融合受扰语音数据的图域、测度域特征并进行质量等级评估。实验结果表明,2种方法的评估准确率均达到了90%以上。其中,多模态评估方法与现有的研究方法相比,准确率提升了约3.269%,证明所提方法具有更优的性能。

关键词: 语音质量评估, 语音信号处理, 多模态融合, 深度神经网络

Abstract:

In view of the objective assessment problem of the effect of communication speech interference, methods based on multi-measurements and multimodal fusion were proposed.First, the interfered speech was preprocessed by the endpoint detection algorithm and time warping algorithm.Then, the content of speech was extracted and performed measurement calculated with the standard speech to obtain five kinds of measure.After the fusion of five measures, random forest model was used to assessed the quality level.Finally, a neural network model based on residual structure was designed combined multimodal fusion technique, which fused the graph domain and measure domain features of the interfered speech data and performed quality level assessment.Experimental results show that the accuracy of two methods have reached more than 90%.Among them, the multimodal assessment method improves the accuracy by about 3.269% compared with the existing research methods, which proves that it has a better performance.

Key words: speech quality assessment, speech signal processing, multimodal fusion, deep neural network

中图分类号: 

No Suggested Reading articles found!