采用局部相位量化的合成语音检测方法

doi:10.11959/j.issn.1000-0801.2024024

电信科学 ›› 2024, Vol. 40 ›› Issue (2): 63-71.doi: 10.11959/j.issn.1000-0801.2024024

• 研究与开发 • 上一篇

采用局部相位量化的合成语音检测方法

徐嘉¹, 简志华¹^,², 金宏辉¹, 杨曼¹

¹ 杭州电子科技大学信工程学院，浙江杭州 310018
² 浙江省数据存储传输及应用技术研究重点实验室，浙江杭州 310018

修回日期:2024-01-07 出版日期:2024-02-01 发布日期:2024-02-01
作者简介:徐嘉（1998- ），女，杭州电子科技大学通信工程学院硕士生，主要研究方向为语音伪装检测
简志华（1978- ），男，博士，杭州电子科技大学通信工程学院副教授、硕士生导师，浙江省数据存储传输及应用技术研究重点实验室副教授，主要研究方向为语音转换、伪装语音检测、声纹识别以及语音隐私保护等
金宏辉（1999- ），男，杭州电子科技大学通信工程学院硕士生，主要研究方向为语音转换和伪装检测
杨曼（2000- ），女，杭州电子科技大学通信工程学院硕士生，主要研究方向为语音伪装检测
基金资助:
国家自然科学基金资助项目(61201301);国家自然科学基金资助项目(61772166)

A method for synthetic speech detection using local phase quantization

Jia XU¹, Zhihua JIAN¹^,², Honghui JIN¹, Man YANG¹

¹ School of Communication Engineering, Hangzhou Dianzi University, Hangzhou 310018, China
² Key Laboratory of Data Storage and Transmission Technology of Zhejiang Province, Hangzhou 310018, China

Revised:2024-01-07 Online:2024-02-01 Published:2024-02-01
Supported by:
The National Natural Science Foundation of China(61201301);The National Natural Science Foundation of China(61772166)

摘要/Abstract

摘要：

由于语音合成的便利性，合成伪装语音对说话人认证系统的安全构成了很大的威胁。为了进一步提升说话人认证系统的伪装语音检测能力，提出了一种利用语谱图频域信息的合成语音检测方法，它通过局部相位量化算法对语谱图频域信息进行描述。首先，将语谱图分为若干子块，然后对每个子块进行局部相位量化，经直方图统计分析后获得局部相位量化特征向量并将该特征向量作为随机森林分类器的输入特征，实现合成语音检测。实验结果表明，该方法进一步降低了合成语音检测系统的串联检测代价数值，并且具有更强的泛化能力。

关键词: 说话人认证, 伪装攻击, 合成语音检测, 局部相位量化

Abstract:

Due to the convenience of speech synthesis, synthesized disguised speech poses a great threat to the security of speaker verification systems.In order to further enhance the ability of detecting the camouflage to the speaker verification system, a method of synthetic speech detection was put forward using the information in spectral domain of the synthetic speech spectrogram.The method employed the local phase quantization (LPQ) algorithm to describe frequency domain information in the speech spectrogram.Firstly, the spectrogram was divided into several sub-blocks, and then the LPQ was performed on each sub-block.After the histogram statistical analysis, the LPQ feature vector was obtained and used as the input feature of the random forest classifier to realize the synthetic speech detection.The experimental results demonstrate that the proposed method further reduces tandem detection cost function (t-DCF) and has better generalization ability.

Key words: speaker verification, spoofing attack, synthetic speech detection, LPQ

中图分类号:

TP391.42

徐嘉, 简志华, 金宏辉, 杨曼. 采用局部相位量化的合成语音检测方法[J]. 电信科学, 2024, 40(2): 63-71.

Jia XU, Zhihua JIAN, Honghui JIN, Man YANG. A method for synthetic speech detection using local phase quantization[J]. Telecommunications Science, 2024, 40(2): 63-71.

图/表 10

图1

图2

图3

图4

图5

表1

表2

图6

表3

表4

参考文献 23

[1]	REN Y Q , PENG H P , LI L X ,et al. Generalized voice spoofing detection via integral knowledge amalgamation[J]. IEEE/ACM Transactions on Audio,Speech,and Language Processing, 2023(31): 2461-2475.
[2]	CHENG P , ROEDIG U . Personal voice assistant security and privacy—a survey[J]. Proceedings of the IEEE, 2022,110(4): 476-507.
[3]	徐剑, 简志华, 于佳祺 ,等. 采用完整局部二进制模式的伪装语音检测[J]. 电信科学, 2021,37(5): 91-99.
	XU J , JIAN Z H , YU J Q ,et al. Completed local binary pattern based speech anti-spoofing[J]. Telecommunications Science, 2021,37(5): 91-99.
[4]	徐嘉, 简志华, 金宏辉 ,等. 基于中心对称局部二值模式的合成伪装语音检测方法[J]. 电信科学, 2023,39(1): 72-78.
	XU J , JIAN Z H , JIN H H ,et al. Synthetic spoofing speech detection method based on center-symmetric local binary pattern[J]. Telecommunications Science, 2023,39(1): 72-78.
[5]	陈佳, 章坚武, 张浙亮 . 基于上下文信息与注意力特征的欺骗语音检测[J]. 电信科学, 2023,39(2): 92-102.
	CHEN J , ZHANG J W , ZHANG Z L . Spoof speech detection based on context information and attention feature[J]. Telecommunications Science, 2023,39(2): 92-102.
[6]	MITTAL A , DUA M . Automatic speaker verification systems and spoof detection techniques:review and analysis[J]. International Journal of Speech Technology, 2021,25(1): 105-134.
[7]	ALZANTOT M , WANG Z , SRIVASTAVA M B . Deep residual neural networks for audio spoofing detection[C]// Proceedings of 20th Annual Conference of the International Speech Communication Association 2019(INTERSPEECH 2019). Graz,Austria:ISCA, 2019: 1078-1082.
[8]	NAGAKRISHNAN R , REVATHI A . Generic speech based person authentication system with genuine and spoofed utterances:different feature sets and models[J]. Multimedia Tools and Applications, 2021,81(1): 1179-1208.
[9]	TODISCO M , HéCTOR DELGADO , EVANS N . Constant Q cepstral coefficients:a spoofing countermeasure for automatic speaker verification[J]. Computer Speech ＆ Language, 2017(45): 516-535.
[10]	LOWEIMI E , BARKER J , HAIN T . Statistical normalisation of phase-based feature representation for robust speech recognition[C]// Proceedings of the 2017 IEEE International Conference on Acoustics,Speech and Signal Processing (ICASSP). Piscataway:IEEE Press, 2017: 5310-5314.
[11]	YANG J C , WANG H J , DAS R K ,et al. Modified magnitude-phase spectrum information for spoofing detection[J]. IEEE/ACM Transactions on Audio,Speech,and Language Processing, 2021(29): 1065-1078.
[12]	KIM J , BAN S M . Phase-aware spoof speech detection based on Res2net with phase network[C]// Proceedings of the ICASSP 2023 - 2023 IEEE International Conference on Acoustics,Speech and Signal Processing (ICASSP). Piscataway:IEEE Press, 2023: 1-5.
[13]	OJANSIVU V , HEIKKILA J . Blur insensitive texture classification using local phase quantization[J]. Lecture Notes in Computer Science, 2008(5099): 236-243.
[14]	MACIN G , TASCI B , TASCI I ,et al. An accurate multiple sclerosis detection model based on exemplar multiple parameters local phase quantization:ExMPLPQ[J]. Applied Sciences, 2022,12(10): 4920-4929.
[15]	RASWA F H , KINARTA I Y , PULUNGAN R ,et al. Fingerprint liveness detection using denoised-bayes shrink wavelet and aggregated local spatial and frequency features[C]// Proceedings of the 2022 International Conference on Machine Learning and Cybernetics (ICMLC). Piscataway:IEEE Press, 2022: 103-108.
[16]	CHAA M , AKHTAR Z , LATI A . Contactless person recognition using 2D and 3D finger knuckle patterns[J]. Multimedia Tools and Applications, 2022,81(6): 8671-8689.
[17]	刘琳岚, 高声荣, 舒坚 . 基于随机森林的链路质量预测[J]. 通信学报, 2019,40(4): 202-211.
	LIU L L , GAO S R , SHU J . Link quality prediction based on random forest[J]. Journal on Communications, 2019,40(4): 202-211.
[18]	WANG X , YAMAGISHI J , TODISCO M ,et al. ASVspoof 2019:a large-scale public database of synthesized,converted and replayed speech[J]. Computer Speech ＆ Language, 2020(64): 101114.
[19]	KINNUNEN T , DELGADO H , EVANS N ,et al. Tandem as sessment of spoofing countermeasures and automatic speaker verification:fundamentals[J]. IEEE/ACM Transactions on Audio,Speech,and Language Processing, 2020(28): 2195-2210.
[20]	LU F X , HUANG J . An improved local binary pattern operator for texture classification[C]// Proceedings of the 2016 IEEE International Conference on Acoustics,Speech and Signal Processing (ICASSP). Piscataway:IEEE Press, 2016: 1308-1311.
[21]	XIONG Z X , LIU M L , GUO Q . Finger vein recognition method based on center-symmetric local binary pattern[C]// Proceedings of the 2019 IEEE International Conference on Artificial Intelligence and Computer Applications (ICAICA). Piscataway:IEEE Press, 2019: 262-266.
[22]	朱长水, 丁勇, 袁宝华 ,等. 融合LBP和LPQ的人脸识别[J]. 南京师大学报(自然科学版), 2015,38(1): 104-107,112.
	ZHU C S , DING Y , YUAN B H ,et al. Face recognition based on local binary pattern and local phase quantization[J]. Journal of Nanjing Normal University (Natural Science Edition), 2015,38(1): 104-107,112.
[23]	GRIFFIN D , LIM J . Signal estimation from modified short-time Fourier transform[J]. IEEE Transactions on Acoustics,Speech,and Signal Processing, 1984,32(2): 236-243.

子集	说话人数目		语音数目
子集	男性	女性	真实	转换	合成
训练集	8	12	2 580	22 800	15 200
开发集	4	6	2 548	22 296	14 864
评估集	21	27	7 355	63 882	49 140

特征	开发集		测试集
特征	SVM	RF	SVM	RF
MFCC	0.380 0	0.357 1	0.743 8	0.731 3
LPCC	0.349 4	0.211 6	0.631 1	0.495 4
LFCC	0.124 6	0.122 8	0.187 3	0.136 7
CQCC	0.159 0	0.130 0	0.172 9	0.172 0
LBP	0.175 6	0.160 1	0.185 6	0.176 5
CSLBP	0.122 4	0.121 0	0.123 6	0.122 2
LPQ	0.094 1	0.082 5	0.094 5	0.082 7

类型	SVM			RF
类型	LBP	CSLBP	LPQ	LBP	CSLBP	LPQ
A01	0.080 7	0.100 4	0.033 2	0.073 2	0.105 3	0.032 5
A02	0.375 3	0.204 6	0.248 9	0.269 0	0.178 6	0.206 4
A03	0.088 4	0.082 1	0.012 4	0.177 0	0.105 2	0.041 2
A04	0.143 1	0.102 7	0.081 6	0.120 4	0.093 0	0.050 2

类型	SVM			RF
类型	LBP	CSLBP	LPQ	LBP	CSLBP	LPQ
A07	0.101 7	0.123 6	0.065 1	0.107 4	0.114 1	0.062 3
A08	0.092 3	0.109 0	0.067 7	0.102 4	0.110 5	0.065 4
A09	0.115 4	0.111 1	0.064 6	0.138 7	0.101 7	0.060 1
A10	0.206 9	0.104 1	0.118 2	0.189 8	0.111 0	0.093 6
A11	0.186 9	0.090 9	0.064 9	0.257 7	0.100 6	0.093 5
A12	0.181 4	0.124 4	0.076 5	0.158 8	0.114 2	0.085 0
A13	0.191 7	0.122 3	0.107 0	0.128 0	0.111 4	0.081 9
A14	0.179 3	0.138 4	0.077 4	0.167 0	0.144 0	0.071 2
A15	0.409 5	0.164 6	0.197 2	0.352 5	0.161 9	0.134 6
A16	0.172 6	0.147 0	0.101 1	0.154 7	0.144 3	0.080 2

采用局部相位量化的合成语音检测方法

A method for synthetic speech detection using local phase quantization

在线阅读

PDF下载

可视化

摘要/Abstract

引用本文

使用本文

图/表 10

参考文献 23

相关文章 2

Metrics

推荐阅读 0

[1]	金宏辉, 简志华, 杨曼, 吴超. 采用圆周局部三值模式纹理特征的合成语音检测方法[J]. 电信科学, 2023, 39(6): 85-95.
[2]	李璨,王让定,严迪群,陈亚楠. 基于相位谱的翻录语音攻击检测算法[J]. 电信科学, 2017, 33(8): 145-154.