通信学报 ›› 2024, Vol. 45 ›› Issue (2): 225-239.doi: 10.11959/j.issn.1000-436x.2024018

• 学术通信 • 上一篇    

基于多域融合及神经架构搜索的语音增强方法

张睿, 张鹏云, 孙超利   

  1. 太原科技大学计算机科学与技术学院,山西 太原 030024
  • 修回日期:2023-12-19 出版日期:2024-02-01 发布日期:2024-02-01
  • 作者简介:张睿(1987− ),男,山西太原人,博士,太原科技大学副教授、硕士生导师,主要研究方向为智能信息处理、自动机器学习等
    张鹏云(1999− ),男,河北安平人,太原科技大学硕士生,主要研究方向为智能信息处理、自动机器学习等
    孙超利(1978− ),女,浙江诸暨人,博士,太原科技大学教授、博士生导师,主要研究方向为计算智能、机器学习等
  • 基金资助:
    国家自然科学基金资助项目(62372319);教育部人文社会科学研究基金资助项目(23YJCZH299);山西省重点研发计划基金资助项目(202102020101002);山西省基础研究计划基金资助项目(20210302123216);太原科技大学研究生联合培养示范基地基金资助项目(JD2022004);太原科技大学研究生教育创新基金资助项目(SY2023040)

Speech enhancement method based on multi-domain fusion and neural architecture search

Rui ZHANG, Pengyun ZHANG, Chaoli SUN   

  1. College of Computer Science and Technology, Taiyuan University of Science and Technology, Taiyuan 030024, China
  • Revised:2023-12-19 Online:2024-02-01 Published:2024-02-01
  • Supported by:
    The National Natural Science Foundation of China(62372319);Humanities and Social Science Research Project of Ministry of Education(23YJCZH299);The Key Research and Development Project of Shanxi Province(202102020101002);Basic Research Project of Shanxi Province(20210302123216);Project of Graduate Joint Training Demon-stration Base of Taiyuan University of Science and Technology(JD2022004);Graduate Education Innovation Project of Taiyuan University of Science and Technology(SY2023040)

摘要:

为进一步提高语音增强模型的自学习及降噪能力,提出基于多域融合及神经架构搜索的语音增强方法。该方法设计了语音信号多空间域映射及融合机制,实现信号实复数关联关系的挖掘;围绕模型卷积池化运算特点,提出了复数神经架构搜索机制,通过设计的搜索空间、搜索策略及评估策略,高效自动地构建出语音增强模型。实验搜索到的最优语音增强模型与基线模型的对比泛化实验中,语音质量客观评价(PESQ)、短时客观可懂度(STOI)两大指标较最优基线模型均最大提升5.6%,且模型参数量最低。

关键词: 语音增强模型, 复数空间域映射, 多域融合, 复数神经架构搜索, 低成本评估

Abstract:

In order to further improve the self-learning and noise reduction ability of speech enhancement model, a speech enhancement method based on multi-domain fusion and neural architecture search was proposed.The multi-spatial domain mapping and fusion mechanism of speech signals were designed to realize the mining of real complex number correlation.Based on the characteristics of convolution pooling of the model, a complex neural architecture search mechanism was proposed, and the speech enhancement model was constructed efficiently and automatically through the designed search space, search strategy and evaluation strategy.In the comparison and generalization experiment between the optimal speech enhancement model and the baseline model, the two indexes of PESQ and STOI increase by 5.6% compared with the optimal baseline model, and the number of model parameters is the lowest.

Key words: speech enhancement model, complex spatial domain mapping, multi-domain fusion, complex neural archi-tecture search, low-cost evaluation

中图分类号: 

No Suggested Reading articles found!