基于中心对称局部二值模式的合成伪装语音检测方法

doi:10.11959/j.issn.1000-0801.2023005

电信科学 ›› 2023, Vol. 39 ›› Issue (1): 72-78.doi: 10.11959/j.issn.1000-0801.2023005

基于中心对称局部二值模式的合成伪装语音检测方法

徐嘉¹, 简志华¹, 金宏辉¹, 吴超¹, 游林², 吴迎笑³

¹ 杭州电子科技大学通信工程学院，浙江杭州 310018
² 杭州电子科技大学网络空间安全学院，浙江杭州 310018
³ 杭州电子科技大学计算机学院，浙江杭州 310018

修回日期:2022-12-15 出版日期:2023-01-20 发布日期:2023-01-01
作者简介:徐嘉（1998- ），女，杭州电子科技大学通信工程学院硕士生，主要研究方向为伪装语音检测
简志华（1978- ），男，杭州电子科技大学通信工程学院副教授、硕士生导师，主要研究方向为语音转换、伪装语音检测、声纹识别等
金宏辉（1999- ），男，杭州电子科技大学通信工程学院硕士生，主要研究方向为语音转换和伪装语音检测
吴超（1988- ），男，杭州电子科技大学通信工程学院讲师，主要研究方向为导航信号处理及欺骗干扰检测
游林（1966- ），男，杭州电子科技大学网络空间安全学院教授、博士生导师，主要研究方向为生物信息处理、信息安全、密码学等
吴迎笑（1980- ），女，杭州电子科技大学计算机学院特聘教授，主要研究方向为毫米波感知用于声纹识别与认证、射频信息处理和工业互联网
基金资助:
国家自然科学基金资助项目(61201301);国家自然科学基金资助项目(61772166);国家自然科学基金资助项目(61901154)

Synthetic spoofing speech detection method based on center-symmetric local binary pattern

Jia XU¹, Zhihua JIAN¹, Honghui JIN¹, Chao WU¹, Lin YOU², Yingxiao WU³

¹ School of Communication Engineering, Hangzhou Dianzi University, Hangzhou 310018, China
² School of Cyberspace Security, Hangzhou Dianzi University, Hangzhou 310018, China
³ School of Computer, Hangzhou Dianzi University, Hangzhou 310018, China

Revised:2022-12-15 Online:2023-01-20 Published:2023-01-01
Supported by:
The National Natural Science Foundation of China(61201301);The National Natural Science Foundation of China(61772166);The National Natural Science Foundation of China(61901154)

摘要/Abstract

摘要：

针对基于局部二值模式的伪装语音检测方法的合成语音检测准确度较低的情况，提出了一种基于中心对称局部二值模式的伪装语音检测方法。该方法通过短时傅里叶变换得到语音信号的语谱图，再利用中心对称局部二值模式提取语谱图的纹理特征，并用该纹理特征训练随机森林分类器，从而实现真伪语音的判别。该方法综合考虑语谱图中像素点的数值大小和位置关系，包含了更加全面的纹理信息，并将特征维度降低至16维，有利于减少计算量。实验结果表明，在ASVspoof 2019数据集上，与传统的基于局部二值模式的伪装语音检测方法相比，所提方法将合成伪装语音的串联检测代价函数（t-DCF）降低了 16.98%，检测速度提高了89.73%。

关键词: 说话人验证, 伪装语音检测, 中心对称局部二值模式, 随机森林

Abstract:

In view of the fact that the local binary pattern (LBP) based speech spoofing detection method has low detection accuracy when detecting synthetic speech, a spoofing speech detection method based on center-symmetric local binary pattern (CSLBP) was proposed.In this method, the spectrogram of the speech signal was obtained through short-time Fourier transform (STFT), and then the texture feature was extracted from the spectrogram using the CSLBP.The random forest classifier was trained by the extracted texture feature to realize the discrimination of genuine and spoofing speech.The CSLBP-based method comprehensively considered the value and position relationship of pixels in the spectrogram so as to contain more texture information, and reduced the feature dimension to 16 beneficial to decrease the amount of computation.Experimental results on the ASVspoof 2019 dataset show that, compared with the LBP-based spoofing detection method, the proposed method reduced the tandem detection cost function (t-DCF) of synthetic spoofing speech by 16.98% and increased the detection speed by 89.73%.

Key words: speaker verification, spoofing speech detection, CSLBP, random forest

中图分类号:

TP391.42

徐嘉, 简志华, 金宏辉, 吴超, 游林, 吴迎笑. 基于中心对称局部二值模式的合成伪装语音检测方法[J]. 电信科学, 2023, 39(1): 72-78.

Jia XU, Zhihua JIAN, Honghui JIN, Chao WU, Lin YOU, Yingxiao WU. Synthetic spoofing speech detection method based on center-symmetric local binary pattern[J]. Telecommunications Science, 2023, 39(1): 72-78.

图/表 9

图1

图2

图3

图4

表1

表2

表3

表4

表5

参考文献 21

[1]	KANERVISTO A , HAUTAM?KI V , KINNUNEN T ,et al. Optimizing tandem speaker verification and anti-spoofing systems[J]. IEEE/ACM Transactions on Audio,Speech,and Language Processing, 2022,30: 477-488.
[2]	LEI Z C , YAN H , LIU C H ,et al. Two-path GMM-ResNet and GMM-SENet for ASV spoofing detection[C]// Proceedings of ICASSP 2022 - 2022 IEEE International Conference on Acoustics,Speech and Signal Processing. Piscataway:IEEE Press, 2022: 6377-6381.
[3]	ALZANTOT M , WANG Z Q , SRIVASTAVA M B . Deep residual neural networks for audio spoofing detection[C]// Proceedings of Interspeech 2019. Cary:ISCA, 2019: 1078-1082.
[4]	崔兆国 . 基于SVM的反蓄意模仿说话人识别研究[D]. 桂林:桂林电子科技大学, 2013.
	CUI Z G . Research on speaker recognition of anti-deliberate imitation based on SVM[D]. Guilin:Guilin University of Electronic Technology, 2013.
[5]	PADMANABHAN R , PARTHASARATHI S H K , MURTHY H A . Robustness of phase based features for speaker recognition[C]// Proceedings of Interspeech 2009. Cary:ISCA, 2009: 2299-2302.
[6]	SARATXAGA I , SANCHEZ J , WU Z ,et al. Synthetic speech detection using phase information[J]. Speech Communication, 2016(81): 30-41.
[7]	HOANG V T , . Unsupervised LBP histogram selection for color texture classification via sparse representation[C]// Proceedings of 2018 IEEE International Conference on Information Communication and Signal Processing. Piscataway:IEEE Press, 2018: 79-84.
[8]	SHU X , SONG Z , SHI J ,et al. Multiple channels local binary pattern for color texture representation and classification[J]. Signal Processing:Image Communication, 2021(98): 116392.
[9]	KARANWAL S . A comparative study of 14 state of art descriptors for face recognition[J]. Multimedia Tools and Applications, 2021,80(8): 12195-12234.
[10]	SHI L , WANG X , SHEN Y . Research on 3D face recognition method based on LBP and SVM[J]. Optik:International Journal for Light and Electron Optics, 2020(220): 165157.
[11]	ALEGRE F , VIPPERLA R , AMEHRAYE A ,et al. A new speaker verification spoofing countermeasure based on local binary patterns[C]// Proceedings of Interspeech 2013. Cary:ISCA, 2013: 940-944.
[12]	徐剑, 简志华, 于佳祺 ,等. 采用完整局部二进制模式的伪装语音检测[J]. 电信科学, 2021,37(5): 91-99.
	XU J , JIAN Z H , YU J Q ,et al. Completed local binary pattern based speech anti-spoofing[J]. Telecommunications Science, 2021,37(5): 91-99.
[13]	XIA Z H , YUAN C S , LYU R ,et al. A novel weber local binary descriptor for fingerprint liveness detection[J]. IEEE Transactions on Systems,Man,and Cybernetics:Systems, 2018,50(4): 1526-1536.
[14]	TOFFA O K , MIGNOTTE M . Environmental sound classification using local binary pattern and audio features collaboration[J]. IEEE Transactions on Multimedia, 2021(23): 3978-3985.
[15]	SHAH A , EL-ALFY E , . Comparative analysis of feature extraction and fusion for blind authentication of digital images using chroma channels[J]. Signal Processing:Image Communication, 2021(95): 116271.
[16]	王科俊, 曹逸, 邢向磊 . 基于MB-CSLBP的手指静脉加密算法研究[J]. 智能系统学报, 2018,13(4): 543-549.
	WANG K J , CAO Y , XING X L . Finger-vein encryption algorithm based on MB-CSLBP[J]. CAAI Transactions on Intelligent Systems, 2018,13(4): 543-549.
[17]	WANG X , YAMAGISHI J , TODISCO M ,et al. ASVspoof 2019:a large-scale public database of synthesized,converted and replayed speech[J]. Computer Speech ＆ Language, 2020(64): 101114.
[18]	KINNUNEN T , DELGADO H , EVANS N ,et al. Tandem assessment of spoofing countermeasures and automatic speaker verification:fundamentals[J]. IEEE/ACM Transactions on Audio,Speech,and Language Processing, 2020(28): 2195-2210.
[19]	HEIKKILA M , PIETIKAINEN M . A texture-based method for modeling the background and detecting moving objects[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2006,28(4): 657-662.
[20]	LIU L J , LING Z H , JIANG Y ,et al. WaveNet vocoder with limited training data for voice conversion[C]// Proceedings of Annual Conference of the International Speech Communication Association (Interspeech). Cary:ISCA, 2018: 1983-1987.
[21]	LI Y J , SWERSKY K , ZEMEL R . Generative moment matching networks[C]// Proceedings of International Conference on Machine Learning (ICML).[S.l.:s.n.], 2015: 1718-1727.

子集	语音数目/个
子集	真实	伪装	合成
训练集	2 580	22 800	15 200
开发集	2 548	22 296	14 864
评估集	7 355	63 882	49 140

T	0	1	2	3	4	5	6	7	8	9	10
SVM	0.218 4	0.202 9	0.164 8	0.134 4	0.131 4	0.123 1	0.124 7	0.128 5	0.142 3	0.154 6	0.167 2
随机森林	0.189 2	0.190 6	0.179 0	0.144 0	0.116 4	0.101 7	0.102 1	0.106 3	0.116 3	0.130 6	0.147 9

模型	MFCC	LPCC	LBP	CSLBP
SVM	0.275 4	0.646 8	0.156 3	0.123 1
随机森林	0.391 8	0.541 5	0.122 5	0.101 7

特征	A07	A08	A09	A10	A11	A12	A13	A14	A15	A16	平均
MFCC	0.363 5	0.232 1	0.444 0	0.413 0	0.518 8	0.586 5	0.318 4	0.354 0	0.332 3	0.398 4	0.396 1
LPCC	0.481 9	0.748 4	0.604 2	0.454 9	0.529 9	0.597 6	0.184 3	0.758 0	0.703 6	0.384 7	0.544 8
LBP	0.020 8	0.014 2	0.067 3	0.133 8	0.238 7	0.100 0	0.050 5	0.104 6	0.396 4	0.093 6	0.122 0
CSLBP	0.050 9	0.032 3	0.055 7	0.086 1	0.070 8	0.026 1	0.050 4	0.238 0	0.336 5	0.062 2	0.100 9

基于中心对称局部二值模式的合成伪装语音检测方法

Synthetic spoofing speech detection method based on center-symmetric local binary pattern

在线阅读

PDF下载

可视化

摘要/Abstract

引用本文

使用本文

图/表 9

参考文献 21

相关文章 10

Metrics

推荐阅读 0

模型	检测耗时/s
模型	MFCC	LPCC	LBP	CSLBP
SVM	16	31	247	17
随机森林	65	42	516	53

[1]	金宏辉, 简志华, 杨曼, 吴超. 采用圆周局部三值模式纹理特征的合成语音检测方法[J]. 电信科学, 2023, 39(6): 85-95.
[2]	于佳祺, 简志华, 徐嘉, 游林, 汪云路, 吴超. 基于联合特征与随机森林的伪装语音检测[J]. 电信科学, 2022, 38(6): 91-99.
[3]	卢子萌,陈佳怡,李璟,谢岳,蒋欣利,韩蕾,郭倩. 基于加权随机森林算法的空巢电力用户识别方法[J]. 电信科学, 2020, 36(8): 112-121.
[4]	张溶芳,许丹丹,王元光,潘思宇,李正茂. 机器学习在物联网虚假用户识别中的运用[J]. 电信科学, 2019, 35(7): 136-144.
[5]	文鹏,彭宗举,陈芬,蒋刚毅,郁梅. 基于随机森林的HEVC复杂度控制方法[J]. 电信科学, 2019, 35(2): 14-26.
[6]	杜续,冯景瑜,吕少卿,石薇. 基于随机森林回归分析的PM2.5浓度预测模型[J]. 电信科学, 2017, 33(7): 66-75.
[7]	王彦青,王瀚辰. 一种识别骚扰电话的组合算法研究[J]. 电信科学, 2017, 33(7): 112-119.
[8]	李倩,江昊,杨锦涛. 基于手机上网记录数据的个体相遇预测[J]. 电信科学, 2017, 33(10): 115-123.
[9]	刘歌,张国毅,于岩. 基于随机森林的雷达信号脉内调制识别[J]. 电信科学, 2016, 32(5): 69-78.
[10]	王铮,任华,方燕萍. 随机森林在运营商大数据补全中的应用[J]. 电信科学, 2016, 32(12): 7-12.