采用圆周局部三值模式纹理特征的合成语音检测方法

doi:10.11959/j.issn.1000-0801.2023121

电信科学 ›› 2023, Vol. 39 ›› Issue (6): 85-95.doi: 10.11959/j.issn.1000-0801.2023121

采用圆周局部三值模式纹理特征的合成语音检测方法

金宏辉¹, 简志华¹^,², 杨曼¹, 吴超¹

¹ 杭州电子科技大学通信工程学院，浙江杭州 310018
² 浙江省数据存储传输及应用技术研究重点实验室，浙江杭州 310018

修回日期:2023-06-05 出版日期:2023-06-20 发布日期:2023-06-01
作者简介:金宏辉（1999- ），男，杭州电子科技大学通信工程学院硕士生，主要研究方向为伪装语音检测
简志华（1978- ），男，杭州电子科技大学通信工程学院副教授、硕士生导师，浙江省数据存储传输及应用技术研究重点实验室教师，主要研究方向为语音转换、伪装语音检测、语音中的隐私保护等
杨曼（2000- ），女，杭州电子科技大学通信工程学院硕士生，主要研究方向为伪装语音检测
吴超（1988- ），男，杭州电子科技大学通信工程学院讲师，主要研究方向为导航信号处理及欺骗干扰检测
基金资助:
国家自然科学基金资助项目(61201301);国家自然科学基金资助项目(61772166);国家自然科学基金资助项目(61901154)

Synthetic speech detection method using texture feature based on circumferential local ternary pattern

Honghui JIN¹, Zhihua JIAN¹^,², Man YANG¹, Chao WU¹

¹ School of Communication Engineering, Hangzhou Dianzi University, Hangzhou 310018, China
² Key Laboratory of Data Storage and Transmission Technology of Zhejiang Province, Hangzhou 310018, China

Revised:2023-06-05 Online:2023-06-20 Published:2023-06-01
Supported by:
The National Natural Science Foundation of China(61201301);The National Natural Science Foundation of China(61772166);The National Natural Science Foundation of China(61901154)

摘要/Abstract

摘要：

为了进一步提高合成语音检测的准确率，提出了一种采用圆周局部三值模式（CLTP）纹理特征的合成语音检测方法。该方法利用圆周局部三值模式提取语谱图中的纹理信息并作为语音的特征表示，采用深度残差网络作为后端分类器来判决语音真伪。实验结果表明，在ASVspoof 2019数据集上，与传统的常量Q倒谱系数（CQCC）和线性预测倒谱系数（LPCC）两种特征相比，该方法在等错误率（EER）上分别降低了54.29%和 2.15%，与局部三值模式（LTP）纹理特征相比，该方法在等错误率上也降低了 17.14%。圆周局部三值模式由于综合考虑了邻域内中心像素与周边像素之间以及各周边像素之间的差异，更加全面地获取了语谱图的纹理信息，提高了合成语音检测的准确率。

关键词: 说话人验证, 合成语音检测, 圆周局部三值模式, 深度残差网络

Abstract:

In order to further improve the accuracy of synthetic speech detection, a synthetic speech detection method using texture feature based on circumferential local ternary pattern (CLTP) was proposed.The method extracted the texture information from the speech spectrogram using the CLTP and applied it as the feature representation of speech.The deep residual network was employed as the back-end classifier to determine the real or spoofing speech.The experimental results demonstrate that, on the ASVspoof 2019 dataset, the proposed method reduces the equal error rate (EER) by 54.29% and 2.15% respectively, compared with the traditional constant Q cepstral coefficient (CQCC) and linear predictive cepstral coefficient (LPCC), and reduced the EER by 17.14% compared with the local ternary pattern(LTP) texture features.The CLTP comprehensively takes into account the differences between the central and peripheral pixels in the neighborhood and between each peripheral pixel.Then it can acquire more texture information from the speech spectrogram, and improve the accuracy of synthetic speech detection.

Key words: speaker verification, synthetic speech detection, CLTP, deep residual network

中图分类号:

TP391.42

金宏辉, 简志华, 杨曼, 吴超. 采用圆周局部三值模式纹理特征的合成语音检测方法[J]. 电信科学, 2023, 39(6): 85-95.

Honghui JIN, Zhihua JIAN, Man YANG, Chao WU. Synthetic speech detection method using texture feature based on circumferential local ternary pattern[J]. Telecommunications Science, 2023, 39(6): 85-95.

图/表 10

图1

图2

图3

表1

表2

表3

图4

图5

表4

表5

参考文献 26

[1]	DHANUSH B K , SUPARNA S , AARTHY R ,et al. Factor analysis methods for joint speaker verification and spoof detection[C]// Proceedings of 2017 IEEE International Conference on Acoustics,Speech and Signal Processing (ICASSP). Piscataway:IEEE Press, 2017: 5385-5389.
[2]	MO Y C , WANG S L . Multi-task learning improves synthetic speech detection[C]// Proceedings of ICASSP 2022 - 2022 IEEE International Conference on Acoustics,Speech and Signal Processing (ICASSP). Piscataway:IEEE Press, 2022: 6392-6396.
[3]	LI C T , YANG F R , YANG J . The role of long-term dependency in synthetic speech detection[J]. IEEE Signal Processing Letters, 2022,29: 1142-1146.
[4]	PAUL D , PAL M , SAHA G . Spectral features for synthetic speech detection[J]. IEEE Journal of Selected Topics in Signal Processing, 2017,11(4): 605-617.
[5]	HIMAWAN I , VILLAVICENCIO F , SRIDHARAN S ,et al. Deep domain adaptation for anti-spoofing in speaker verification systems[J]. Computer Speech ＆ Language, 2019,58: 377-402.
[6]	梁瑞刚, 吕培卓, 赵月 ,等. 视听觉深度伪造检测技术研究综述[J]. 信息安全学报, 2020,5(2): 1-17.
	LIANG R G , LYU P Z , ZHAO Y ,et al. A survey of audiovisual deepfake detection techniques[J]. Journal of Cyber Security, 2020,5(2): 1-17.
[7]	YANG J C , DAS R K , LI H Z . Extended constant-Q cepstral coefficients for detection of spoofing attacks[C]// Proceedings of 2018 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA ASC). Piscataway:IEEE Press, 2019: 1024-1029.
[8]	SRINIVAS K , DAS R K , PATIL H A . Combining phase-based features for replay spoof detection system[C]// Proceedings of 2018 11th International Symposium on Chinese Spoken Language Processing (ISCSLP). Piscataway:IEEE Press, 2019: 151-155.
[9]	任延珍, 刘晨雨, 刘武洋 ,等. 语音伪造及检测技术研究综述[J]. 信号处理, 2021,37(12): 2412-2439.
	REN Y Z , LIU C Y , LIU W Y ,et al. A survey on speech forgery and detection[J]. Journal of Signal Processing, 2021,37(12): 2412-2439.
[10]	YANG J C , DAS R K . Improving anti-spoofing with octave spectrum and short-term spectral statistics information[J]. Applied Acoustics, 2020,157:107017.
[11]	徐剑, 简志华, 于佳祺 ,等. 采用完整局部二进制模式的伪装语音检测[J]. 电信科学, 2021,37(5): 91-99.
	XU J , JIAN Z H , YU J Q ,et al. Completed local binary pattern based speech anti-spoofing[J]. Telecommunications Science, 2021,37(5): 91-99.
[12]	ALEGRE F , AMEHRAYE A , EVANS N . A one-class classification approach to generalised speaker verification spoofing countermeasures using local binary patterns[C]// Proceedings of 2013 IEEE Sixth International Conference on Biometrics:Theory,Applications and Systems (BTAS). Piscataway:IEEE Press, 2014: 1-8.
[13]	JAVED A , MALIK K M , MALIK H ,et al. Voice spoofing detector:a unified anti-spoofing framework[J]. Expert Systems With Applications, 2022,198:116770.
[14]	ZHAO X C , LIN Y P , HEIKKIL? J . Dynamic texture recognition using volume local binary count patterns with an application to 2D face spoofing detection[J]. IEEE Transactions on Multimedia, 2018,20(3): 552-566.
[15]	ZHANG Y J , LI S H , WANG S L ,et al. Revealing the traces of Median filtering using high-order local ternary patterns[J]. IEEE Signal Processing Letters, 2014,21(3): 275-279.
[16]	ZHENG Z H , XU B C , JU J P ,et al. Circumferential local ternary pattern:new and efficient feature descriptors for anti-counterfeiting pattern identification[J]. IEEE Transactions on Information Forensics and Security, 2022,17: 970-981.
[17]	于佳祺, 简志华, 徐嘉 ,等. 基于联合特征与随机森林的伪装语音检测[J]. 电信科学, 2022,38(6): 91-99.
	YU J Q , JIAN Z H , XU J ,et al. Spoofing speech detection algorithm based on joint feature and random forest[J]. Telecommunications Science, 2022,38(6): 91-99.
[18]	梁超, 高勇 . 一种利用 SE-Res2Net 的合成语音检测系统[J]. 无线电工程, 2022,52(9): 1560-1565.
	LIANG C , GAO Y . A synthetic speech detection system using SE-Res2Net[J]. Radio Engineering, 2022,52(9): 1560-1565.
[19]	MONTEIRO J , ALAM J , FALK T H . Generalized end-to-end detection of spoofing attacks to automatic speaker recognizers[J]. Computer Speech ＆ Language, 2020,63:101096.
[20]	ZHANG Y , JIANG F , DUAN Z Y . One-class learning towards synthetic voice spoofing detection[J]. IEEE Signal Processing Letters, 2021,28: 937-941.
[21]	WANG X , YAMAGISHI J , TODISCO M ,et al. ASVspoof 2019:a large-scale public database of synthesized,converted and replayed speech[J]. Computer Speech ＆ Language, 2020,64:101114.
[22]	KINNUNEN T , DELGADO H , EVANS N ,et al. Tandem assessment of spoofing countermeasures and automatic speaker verification:fundamentals[J]. IEEE/ACM Transactions on Audio,Speech,and Language Processing, 2020,28: 2195-2210.
[23]	MALIK K M , JAVED A , MALIK H ,et al. A light-weight replay detection framework for voice controlled IoT devices[J]. IEEE Journal of Selected Topics in Signal Processing, 2020,14(5): 982-996.
[24]	WU Z Z , DAS R K , YANG J C ,et al. Light convolutional neural network with feature genuinization for detection of synthetic speech attacks[C]// Proceedings of Interspeech 2020.[S.l.:s.n.], 2020: 1101-1105.
[25]	TAK H , PATINO J , TODISCO M ,et al. End-to-end anti-spoofing with RawNet2[C]// Proceedings of ICASSP 2021-2021 IEEE International Conference on Acoustics,Speech and Signal Processing (ICASSP). Piscataway:IEEE Press, 2021: 6369-6373.
[26]	王锦阳, 华光, 黄双 . 基于注意力机制的端到端合成语音检测[J]. 信号处理, 2022,38(9): 1975-1987.
	WANG J Y , HUA G , HUANG S . End-to-end synthetic speech detection based on attention mechanism[J]. Journal of Signal Processing, 2022,38(9): 1975-1987.

子集	真实语音/条	欺骗语音/条	说话人/个	攻击方式
训练集	2 580	22 800	20	A01～A06
开发集	2 548	22 296	20	A01～A06
评估集	7 355	49 140	67	A07～A16

对比项	T=1	T=2	T=3	T=4
A07	5.69%	2.53%	3.94%	3.99%
A08	8.89%	6.78%	8.96%	11.66%
A09	2.40%	1.71%	2.05%	2.09%
A10	7.89%	3.07%	5.21%	5.07%
A11	5.88%	2.28%	3.71%	3.93%
A12	5.25%	3.01%	3.81%	4.17%
A13	5.56%	2.30%	5.33%	4.82%
A14	7.71%	4.13%	5.47%	5.39%
A15	6.33%	2.75%	4.80%	4.80%
A16	5.94%	3.52%	4.05%	4.43%
平均	6.25%	3.19%	4.76%	5.06%

分类器	特征	EER	精确度	灵敏度	F1分数
GMM	CQCC	9.57%	87.08%	90.43%	88.72%
	LFCC	8.51%	88.51%	91.49%	89.98%
	LTP	7.97%	94.53%	92.03%	93.26%
	CLTP	7.21%	95.06%	92.79%	93.91%
ResNet-18	CQCC	6.98%	95.23%	93.02%	94.11%
	LFCC	3.26%	97.80%	96.74%	97.27%
	LTP	3.85%	97.39%	96.15%	96.77%
	CLTP	3.19%	97.85%	96.81%	97.33%

对比项	EER	精确度	灵敏度	F1分数
A07	2.53%	98.30%	97.47%	97.88%
A08	6.78%	95.37%	93.22%	94.28%
A09	1.71%	98.85%	98.29%	98.57%
A10	3.07%	97.93%	96.93%	97.43%
A11	2.28%	98.47%	97.72%	98.09%
A12	3.01%	97.97%	96.99%	97.48%
A13	2.31%	98.44%	97.69%	98.07%
A14	4.13%	97.20%	95.87%	96.53%
A15	2.75%	98.15%	97.25%	97.70%
A16	3.52%	97.62%	96.48%	97.05%

方法	EER	t-DCF
Raw-audio+RawNet2^[25]	4.66%	0.129
FG-LCNN^[24]	4.07%	0.102
Raw-audio+Inc-TSSDNet+CABM^[26]	3.28%	0.087
本文方法	3.19%	0.085

采用圆周局部三值模式纹理特征的合成语音检测方法

Synthetic speech detection method using texture feature based on circumferential local ternary pattern

在线阅读

PDF下载

可视化

摘要/Abstract

引用本文

使用本文

图/表 10

参考文献 26

相关文章 2

Metrics

推荐阅读 0

[1]	徐嘉, 简志华, 金宏辉, 吴超, 游林, 吴迎笑. 基于中心对称局部二值模式的合成伪装语音检测方法[J]. 电信科学, 2023, 39(1): 72-78.
[2]	马键, 张广晋, 张磊, 戴经纬. 基于改进深度残差网络算法的智能干扰识别[J]. 电信科学, 2022, 38(10): 98-106.