电信科学 ›› 2024, Vol. 40 ›› Issue (2): 63-71.doi: 10.11959/j.issn.1000-0801.2024024

• 研究与开发 • 上一篇    

采用局部相位量化的合成语音检测方法

徐嘉1, 简志华1,2, 金宏辉1, 杨曼1   

  1. 1 杭州电子科技大学信工程学院,浙江 杭州 310018
    2 浙江省数据存储传输及应用技术研究重点实验室,浙江 杭州 310018
  • 修回日期:2024-01-07 出版日期:2024-02-01 发布日期:2024-02-01
  • 作者简介:徐嘉(1998- ),女,杭州电子科技大学通信工程学院硕士生,主要研究方向为语音伪装检测
    简志华(1978- ),男,博士,杭州电子科技大学通信工程学院副教授、硕士生导师,浙江省数据存储传输及应用技术研究重点实验室副教授,主要研究方向为语音转换、伪装语音检测、声纹识别以及语音隐私保护等
    金宏辉(1999- ),男,杭州电子科技大学通信工程学院硕士生,主要研究方向为语音转换和伪装检测
    杨曼(2000- ),女,杭州电子科技大学通信工程学院硕士生,主要研究方向为语音伪装检测
  • 基金资助:
    国家自然科学基金资助项目(61201301);国家自然科学基金资助项目(61772166)

A method for synthetic speech detection using local phase quantization

Jia XU1, Zhihua JIAN1,2, Honghui JIN1, Man YANG1   

  1. 1 School of Communication Engineering, Hangzhou Dianzi University, Hangzhou 310018, China
    2 Key Laboratory of Data Storage and Transmission Technology of Zhejiang Province, Hangzhou 310018, China
  • Revised:2024-01-07 Online:2024-02-01 Published:2024-02-01
  • Supported by:
    The National Natural Science Foundation of China(61201301);The National Natural Science Foundation of China(61772166)

摘要:

由于语音合成的便利性,合成伪装语音对说话人认证系统的安全构成了很大的威胁。为了进一步提升说话人认证系统的伪装语音检测能力,提出了一种利用语谱图频域信息的合成语音检测方法,它通过局部相位量化算法对语谱图频域信息进行描述。首先,将语谱图分为若干子块,然后对每个子块进行局部相位量化,经直方图统计分析后获得局部相位量化特征向量并将该特征向量作为随机森林分类器的输入特征,实现合成语音检测。实验结果表明,该方法进一步降低了合成语音检测系统的串联检测代价数值,并且具有更强的泛化能力。

关键词: 说话人认证, 伪装攻击, 合成语音检测, 局部相位量化

Abstract:

Due to the convenience of speech synthesis, synthesized disguised speech poses a great threat to the security of speaker verification systems.In order to further enhance the ability of detecting the camouflage to the speaker verification system, a method of synthetic speech detection was put forward using the information in spectral domain of the synthetic speech spectrogram.The method employed the local phase quantization (LPQ) algorithm to describe frequency domain information in the speech spectrogram.Firstly, the spectrogram was divided into several sub-blocks, and then the LPQ was performed on each sub-block.After the histogram statistical analysis, the LPQ feature vector was obtained and used as the input feature of the random forest classifier to realize the synthetic speech detection.The experimental results demonstrate that the proposed method further reduces tandem detection cost function (t-DCF) and has better generalization ability.

Key words: speaker verification, spoofing attack, synthetic speech detection, LPQ

中图分类号: 

No Suggested Reading articles found!