电信科学 ›› 2023, Vol. 39 ›› Issue (11): 107-115.doi: 10.11959/j.issn.1000-0801.2023187

• 研究与开发 • 上一篇    

采用恒Q调制包络的合成语音伪装检测方法

徐嘉1, 简志华1,2, 金宏辉1, 吴超1   

  1. 1 杭州电子科技大学通信工程学院,浙江 杭州 310018
    2 浙江省数据存储传输及应用技术研究重点实验室,浙江 杭州 310018
  • 修回日期:2023-09-30 出版日期:2023-11-01 发布日期:2023-11-01
  • 作者简介:徐嘉(1998- ),女,杭州电子科技大学通信工程学院硕士生,主要研究方向为语音伪装检测
    简志华(1978- ),男,博士,杭州电子科技大学通信工程学院副教授、硕士生导师,浙江省数据存储传输及应用技术研究重点实验室教师,主要研究方向为语音转换、伪装语音检测、声纹识别等
    金宏辉(1999- ),男,杭州电子科技大学通信工程学院硕士生,主要研究方向为语音转换和伪装检测
    吴超(1988- ),男,博士,杭州电子科技大学通信工程学院讲师、硕士生导师,主要研究方向为导航信号处理及欺骗干扰检测
  • 基金资助:
    国家自然科学基金资助项目(61201301);国家自然科学基金资助项目(61772166);国家自然科学基金资助项目(61901154)

A method of synthetic speech spoofing detection using constant Q modulation envelope

Jia XU1, Zhihua JIAN1,2, Honghui JIN1, Chao WU1   

  1. 1 School of Communication Engineering, Hangzhou Dianzi University, Hangzhou 310018, China
    2 Key Laboratory of Data Storage and Transmission Technology of Zhejiang Province, Hangzhou 310018, China
  • Revised:2023-09-30 Online:2023-11-01 Published:2023-11-01
  • Supported by:
    The National Natural Science Foundation of China(61201301);The National Natural Science Foundation of China(61772166);The National Natural Science Foundation of China(61901154)

摘要:

针对传统的声学特征参数对合成语音伪装检测时存在的准确度低、未知类型合成语音检测效果较差、在噪声环境中表现欠佳的情况,提出了一种采用恒Q调制包络(constant Q modulation envelope,CQME)的合成伪装语音检测方法。该方法基于语音时域包络中包含的丰富信息,而合成语音与真实语音的包络在细节上存在较大差异,利用恒Q变换(constant Q transform,CQT)得到语音调制包络谱,并计算每个频率成分的均方根,获得CQME特征向量。再用该特征向量训练随机森林分类器,实现真伪语音的判别。实验结果表明,在ASVspoof 2019数据集上,CQME特征训练的随机森林具有较高的检测性能,对未知类型的合成语音也具有较好的检测效果。并且在多种噪声条件下,该方法仍表现出较高的检测性能,具有很好的噪声鲁棒性。

关键词: 合成语音, 伪装语音检测, 恒Q调制包络, 随机森林

Abstract:

In response to the low accuracy of synthetic speech spoofing detection based on traditional acoustic feature parameters, poor detection performance for unknown types of synthetic speech, and performance degradation in noisy environments, a method for detecting spoofing synthetic speech was proposed using constant Q modulation envelope (CQME) .The motivation of the method was from the fact that the temporal envelope of speech contained abundant information and there was a big difference in detail between the envelope of synthetic speech and genuine speech.The modulation envelope spectrum of speech was obtained by employing constant Q transform (CQT), and the root mean square of each frequency component was calculated to derive the CQME feature vector.And then the CQME feature vector was used to train the random forest classifier for discriminating genuine speech from spoofing synthetic speech.Experimental results demonstrate that the random forest trained with CQME features achieves high detection performance on the ASVspoof 2019 dataset and exhibites good detection efficacy for unknown types of synthetic speech.Furthermore, the proposed method shows high detection performance even under various noise conditions, having excellent noise robustness.

Key words: synthetic speech, spoofing speech detection, constant Q modulation envelope, random forest

中图分类号: 

No Suggested Reading articles found!