基于上下文信息与注意力特征的欺骗语音检测

doi:10.11959/j.issn.1000-0801.2023006

Abstract

Abstract:

With the rapid development of speech synthesis and speech conversion technology, methods of spoof speech detection still have problems such as low spoof detection accuracy and poor generality.Therefore, an end-to-end spoof detection method based on context information and attention feature was proposed.Based on deep residual shrinkage network (DRSN), the proposed method used the dual-branch context information coordination fusion module (DCCM) to aggregate rich context information, and fused features based on coordinate time-frequency attention (CTFA) to obtain cross-dimensional interaction features with context information, thus maximizing the potential of capturing artifacts.Compared with the best baseline system, in the ASVspoof 2019 LA dataset, the proposed method had reduced the EER and t-DCF performance indicators by 68% and 65% respectively, in the ASVspoof 2021 LA dataset, the EER and t-DCF of the proposed method were 4.81 and 0.311 5 and dropped by 48% and 10% separately.The experimental results show that this method can effectively improve the accuracy and generalization ability of spoof speech detection.

Key words: spoof speech detection, context information, attention feature, end-to-end, artifacts

CLC Number:

TN912.3

Jia CHEN, Jianwu ZHANG, Zheliang ZHANG. Spoof speech detection based on context information and attention feature[J]. Telecommunications Science, 2023, 39(2): 92-102.

Figures/Tables 9

References 32

[1]	KINNUNEN T , LI H . An overview of text-independent speaker recognition:from features to supervectors[J]. Speech communication, 2010,52(1): 12-40.
[2]	SINGH N , AGRAWAL A , KHAN R A . Voice biometric:a technology for voice based authentication[J]. Advanced Science,Engineering and Medicine, 2018,10(7-8): 754-759.
[3]	MITTAL A , DUA M . Automatic speaker verification systems and spoof detection techniques:review and analysis[J]. International Journal of Speech Technology, 2021(25): 1-30.
[4]	徐剑, 简志华, 于佳祺 ,等. 采用完整局部二进制模式的伪装语音检测[J]. 电信科学, 2021,37(5): 91-99.
	XU J , JIAN Z H , YU J Q ,et al. Completed local binary pattern based speech anti-spoofing[J]. Telecommunications Science, 2021,37(5): 91-99.
[5]	于佳祺, 简志华, 徐嘉 ,等. 基于联合特征与随机森林的伪装语音检测[J]. 电信科学, 2022,38(6): 91-99.
	YU J Q , JIAN Z H , XU J ,et al. Spoofing speech detection algorithm based on joint feature and random forest[J]. Telecommunications Science, 2022,38(6): 91-99.
[6]	TAK H , PATINO J , TODISCO M ,et al. End-to-end anti-spoofing with RawNet2[C]// Proceedings of 2021 IEEE International Conference on Acoustics,Speech and Signal Processing (ICASSP). Piscataway:IEEE Press, 2021: 6369-6373.
[7]	GE W Y , PATINO J , TODISCO M ,et al. Raw differentiable architecture search for speech deep fake and spoofing detection[EB]. 2021.
[8]	KANG W H , ALAM J , FATHAN A ,et al. Attentive activation function for improving end-to-end spoofing countermeasure systems[EB]. 2022.
[9]	CHEN D S , LI J , XU K ,et al. AReLU:attention-based rectified linear unit[EB]. 2020.
[10]	WANG X , YAMAGISHI J , TODISCO M ,et al. ASVspoof 2019:a large-scale public data base of synthesized,converted and replayed speech[J]. Computer Speech ＆ Language, 2020,64: 101-114.
[11]	YAMAGISHI J , WANG X , TODISCO M ,et al. ASVspoof 2021:accelerating progress in spoofed and deep fake speech detection[EB]. 2021.
[12]	LING H F , HUANG L C , HUANG J R ,et al. Attention-based convolutional neural network for ASV spoofing detection[C]// Proceedings of 2021 INTERSPEECH.[S.l.:s.n.], 2021: 4289-4293.
[13]	ZHOU Y , ZHANG J W , ZHANG P G . Spoof speech detection based on raw cross-dimension interaction attention network[C]// Proceedings of 2022 Chinese Conference on Biometric Recognition. Cham:Springer, 2022: 621-629.
[14]	HE K M , ZHANG X Y , REN S Q ,et al. Deep residual learning for image recognition[C]// Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. Piscataway:IEEE Press, 2016: 770-778.
[15]	HUA G , TEOH A B J , ZHANG H . Towards end-to-end synthetic speech detection[J]. IEEE Signal Processing Letters, 2021,28: 1265-1269.
[16]	SZEGEDY C , LIU W , JIA Y ,et al. Going deeper with convolutions[C]// Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. Piscataway:IEEE Press, 2015: 1-9.
[17]	ZHAO M H , ZHONG S S , FU X Y ,et al. Deep residual shrinkage networks for fault diagnosis[J]. IEEE Transactions on Industrial Informatics, 2019,16(7): 4681-4690.
[18]	周晔, 章坚武, 程继承 . 面向复杂声学环境的伪装语音检测[J]. 传感技术学报, 2022,35(10): 1355-1362.
	ZHOU Y , ZHANG J W , CHENG J C . Speech anti-spoofing for complex acoustic environments[J]. Chinese Journal of Sensors and Actuators, 2022,35(10): 1355-1362.
[19]	王金华, 应娜, 朱辰都 ,等. 基于语谱图提取深度空间注意特征的语音情感识别算法[J]. 电信科学, 2019,35(7): 100-108.
	WANG J H , YING N , ZHU C D ,et al. Speech emotion recognition algorithm based on spectrogram feature extraction of deep space attention feature[J]. Telecommunications Science, 2019,35(7): 100-108.
[20]	LEI S , ZHOU Y X , CHEN L Y ,et al. Towards expressive speaking style modelling with hierarchical context information for mandarin speech synthesis[C]// Proceedings of the 2022 IEEE International Conference on Acoustics,Speech and Signal Processing (ICASSP). Piscataway:IEEE Press, 2022: 7922-7926.
[21]	HU J , SHEN L , ALBANIE S . Squeeze-and-excitation networks[C]// Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. Piscataway:IEEE Press, 2018: 7132-7141.
[22]	WOO S , PARK J , LEE J Y ,et al. CBAM:convolutional block attention module[C]// Proceedings of the 2018 European Conference on Computer Vision.[S.l.:s.n.], 2018: 3-19.
[23]	HOU Q B , ZHOU D Q , FENG J S . Coordinate attention for efficient mobile network design[C]// Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recongnition. Piscataway:IEEE Press, 2021: 13713-13722.
[24]	DAI Y M , GIESEKE F , OEHMCKE S ,et al. Attentional feature fusion[C]// Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision. Piscataway:IEEE Press, 2021: 3560-3569.
[25]	LUO A W , LI E L , LIU Y L ,et al. A capsule network based approach for detection of audio spoofing attacks[C]// Proceed ings of 2021 IEEE International Conference on Acoustics,Speech and Signal Processing (ICASSP). Piscataway:IEEE Press, 2021: 6359-6363.
[26]	LI X , WU X X , LU H ,et al. Channel-wise gated Res2Net:towards robust detection of synthetic speech attacks[C]// Proceedings of 2021 INTERSPEECH.[S.l.:s.n.], 2021: 4314-4318.
[27]	ZHANG Y , JIANG F , DUAN Z Y . One-class learning towards synthetic voice spoofing detection[J]. IEEE Signal Processing Letters, 2021,28: 937-941.
[28]	COHEN A , RIMON I , AFLALO E ,et al. A study on data augmentation in voice anti-spoofing[J]. Speech Communication, 2022,141: 56-67.
[29]	DAS R K , . Known-unknown data augmentation strategies for detection of logical access,physical access and speech deep fake attacks:ASV spoof 2021[C]// Proceedings of 2021 Edition of the Automatic Speaker Verification and Spoofing Countermeasures Challenge.[S.l.:s.n.], 2021: 29-36.
[30]	TAK H , KAMBLE M , PATINO J ,et al. Raw boost:a raw data boosting and augmentation method applied to automatic speaker verification anti-spoofing[C]// Proceedings of 2022 IEEE International Conference on Acoustics,Speech and Signal Processing (ICASSP). Piscataway:IEEE Press, 2022: 6382-6386.
[31]	CáCERES J , FONT R , GRAU T . The biometric vox system for the ASVspoof 2021 challenge[C]// Proceedings 2021 Edition of the Automatic Speaker Verification and Spoofing Countermeasures Challenge.[S.l.:s.n.], 2021: 68-74.
[32]	PAL M , RAIKAR A , PANDA A ,et al. Synthetic speech detection using meta-learning with prototypical loss[EB]. 2022.

Metrics

Recommended 0

No Suggested Reading articles found!

ASVspoof 2019 LA	说话人/名		语音/条
ASVspoof 2019 LA	男性	女性	真实	欺骗	总数
训练集	8	12	2 580	22 800	25 380
开发集	8	12	2 548	22 296	24 844
测试集	30	37	7 355	64 578	71 933

模型	t-DCF	EER
CAFNet-CTFA	0.044 7	1.48
CAFNet-LCIA	0.045 0	1.55
CAFNet-CBAM	0.051 8	1.78
CAFNet-CA	0.054 1	1.82
CAFNet-SENet	0.057 1	2.00

模型	t-DCF	EER
CAFNet	0.044 7	1.48
DRSN-convolution layer	0.056 9	2.18
DRSN-pool layer	0.062 0	2.23
without CTFA	0.071 0	2.84
DRSN	0.082 3	3.09

模型	t-DCF	EER
CAFNet	0.044 7	1.48
Res-TSSDNet^[15]	0.048 1	1.64
Raw CIANet-mul^[13]	0.052 7	1.76
Capsule network^[25]	0.053 8	1.76
Raw PC-DARTS Mel-F^[7]	0.051 7	1.77
MCG-Res2Net50^[26]	0.052 0	1.78
ResNet-FCA^[12]	0.051 0	1.87
SE-ResNet-18-AReLU^[8]	0.050 0	2.02
ResNet18-OC-softmax^[27]	0.059 0	2.19
RawNet2^[6]	0.129 4	4.66

模型	t-DCF	EER
CAFNet	0.311 5	4.81
ResNet-LogSpec^[28]	0.293 0	5.18
LCNN^[29]	0.319 7	5.27
RawNet2-RawBoost^[30]	0.309 9	5.31
Lightweight TDNN-Focal^[31]	0.364 5	7.51
SE-ResNet34-avg-OC-Softmax^[32]	0.332 0	9.03
LFCC-LCNN^[11]	0.344 5	9.26

Spoof speech detection based on context information and attention feature

RichHTML

PDF下载

Knowledge

Abstract

Cite this article

share this article

Figures/Tables 9

References 32

Related Articles 15

Metrics

Recommended 0

[1]	Yanxia TAN, Guangquan WANG, Zelin WANG, Yanlei ZHENG, He ZHANG, Chenfang ZHANG, Sai HAN, Shikui SHEN. Research on technical scheme of SDN intelligent management and control orchestration system [J]. Telecommunications Science, 2023, 39(3): 143-152.
[2]	Kai CHEN. Technical solution and application of 5G slicing private network based on cloud-network integration [J]. Telecommunications Science, 2022, 38(7): 166-174.
[3]	Zhenhua ZHANG, Siyue SUN, Gaosai LIU, Long WANG, Xinglong JIANG, Lin DONG, Guang LIANG. A survey of optical/electric hybrid switching technology for satellite Internet [J]. Telecommunications Science, 2022, 38(11): 1-10.
[4]	Guoxin ZHANG. Research and application of 5G cloud-network-edge-end integrated in-depth security protection system [J]. Telecommunications Science, 2022, 38(10): 173-179.
[5]	Yanxia TAN, Yanlei ZHENG, Guangquan WANG, He ZHANG. Research and practice of OTN controller northbound unified interface model [J]. Telecommunications Science, 2022, 38(10): 163-172.
[6]	Peng LIU, Zongpeng DU, Yongjing LI, Lu LU, Xiaodong DUAN. End-to-end deterministic networking architecture and key technologies [J]. Telecommunications Science, 2021, 37(9): 64-73.
[7]	Yan ZHANG. Networking technology and implementation method based on 5G slice private line [J]. Telecommunications Science, 2021, 37(10): 143-151.
[8]	Yiliang LIU,Xin LI,Kaitao BO. End-to-end network collaboration in 5G networks [J]. Telecommunications Science, 2020, 36(3): 144-155.
[9]	Tianjie MU,Xiaohui CHEN,Yiyun WANG,Lupeng MA,Dong LIU,Jing ZHOU,Wenyi ZHANG. A survey on deep learning based joint source-channel coding [J]. Telecommunications Science, 2020, 36(10): 56-66.
[10]	Yiming WANG,Ken CHEN,Aihaiti ABUDUSALAMU. End-to-end audiovisual speech recognition based on attention fusion of SDBN and BLSTM [J]. Telecommunications Science, 2019, 35(12): 79-89.
[11]	Yongming YAN,Bing CHEN,Wenjie XU. Internet cross-domain end-to-end quality monitoring and trouble location scheme [J]. Telecommunications Science, 2018, 34(8): 177-185.
[12]	Minfeng ZHANG,Nan ZHOU. End-to-end cost assessment of large particle private-wire services [J]. Telecommunications Science, 2016, 32(9): 146-151.
[13]	Zengyi LIU,Shanshan QU,Hong ZHANG. Study on end-to-end QoS parameters of TDD-LTE about the services of mobile internet [J]. Telecommunications Science, 2016, 32(10): 157-164.
[14]	Shan Shuo,Guo Guanxun and Mai Jing. Research and Implementation of Quality Management System for ERP System [J]. Telecommunications Science, 2015, 31(3): 2015070-.
[15]	Shuo Shan,Guanxun Guo,Jing Mai. Research and Implementation of Quality Management System for ERP System [J]. Telecommunications Science, 2015, 31(3): 153-158.