智能科学与技术学报 ›› 2021, Vol. 3 ›› Issue (3): 351-358.doi: 10.11959/j.issn.2096-6652.202136

• 专刊:目标智能检测与识别 • 上一篇    下一篇

基于超轻量通道注意力的端对端语音增强方法

洪依1, 孙成立1, 冷严2   

  1. 1 南昌航空大学信息工程学院,江西 南昌 330063
    2 山东师范大学物理与电子科学学院,山东 济南 250014
  • 修回日期:2021-07-17 出版日期:2021-09-15 发布日期:2021-09-01
  • 作者简介:洪依(1997− ),女,南昌航空大学信息工程学院硕士生,主要研究方向为信号处理、语音增强、回声消除等
    孙成立(1975− ),男,博士后,南昌航空大学信息工程学院教授,主要研究方向为人工智能、语音信号处理、语音识别、语音增强等
    冷严(1982− ),女,博士,山东师范大学物理与电子科学学院副教授,主要研究方向为音频信息处理、音频分类与检测
  • 基金资助:
    国家自然科学基金资助项目(61861033);江西省自然科学基金重点项目(20202ACBL202007);山东省自然科学基金资助项目(ZR2020MF020)

End-to-end speech enhancement based on ultra-lightweight channel attention

Yi HONG1, Chengli SUN1, Yan LENG2   

  1. 1 School of Information Engineering, Nanchang Hangkong University, Nanchang 330063, China
    2 School of Physics and Electronic, Shandong Normal University, Jinan 250014, China
  • Revised:2021-07-17 Online:2021-09-15 Published:2021-09-01
  • Supported by:
    The National Natural Science Foundation of China(61861033);The Key Project of Natural Science Foundation of Jiangxi Province(20202ACBL202007);The Natural Science Foundation of Shandong Province(ZR2020MF020)

摘要:

全卷积时域音频分离网络(Conv-TasNet)是近年提出的一种主流的端对端语音分离模型。Conv-TasNet利用膨胀卷积扩大感受野,使其在空间上可以融合更多语音特征,极大地提高了网络的语音分离性能,但同时忽略了信息在不同卷积通道间的重要性。基于此,提出一种基于超轻量通道注意力的端对端语音增强方法,该方法结合了Conv-TasNet和通道注意力,并在Conv-TasNet编解码器部分增加一组滤波器来提高网络语音特征提取能力,使卷积神经网络可以更有效地结合空间信息和通道信息来提高语音增强效果。实验验证了所提方法的模型容量在只增加了约0.02%的情况下,语音增强性能获得了有效提升。

关键词: 语音增强, 端到端语音分离网络, 通道注意力

Abstract:

The full convolutional time-domain audio separation network (Conv-TasNet) is a state-of-the-art end-to-end speech separation model which was proposed recently.The Conv-TasNet used dilated convolution to expand the receptive field and fuse more speech features in space, which greatly improved the speech separation performance of the network, but at the same time ignored the importance of information across different convolution channels.An end-to-end speech enhancement method based on ultra-lightweight channel attention was proposed, which effectively combined Conv-TasNet and channel attention.At the same time, a group of filters was added to the Conv-TasNet codec to improve the speech feature extraction ability of the network.This method can make convolutional neural network combine spatial information and channel information more effectively to improve the speech enhancement effect.Experiment shows that the proposed model can effectively improve the performance of speech enhancement when the model capacity is only increased by about 0.02%.

Key words: speech enhancement, end-to-end speech separation network, channel attention

中图分类号: 

No Suggested Reading articles found!