基于相位谱的翻录语音攻击检测算法

doi:10.11959/j.issn.1000-0801.2017126

电信科学 ›› 2017, Vol. 33 ›› Issue (8): 145-154.doi: 10.11959/j.issn.1000-0801.2017126

基于相位谱的翻录语音攻击检测算法

李璨,王让定(),严迪群,陈亚楠

宁波大学信息科学与工程学院，浙江宁波 315211

修回日期:2017-03-20 出版日期:2017-08-01 发布日期:2017-08-25
作者简介:李璨（1992-），女，宁波大学信息科学与工程学院硕士生，主要研究方向为多媒体通信与信息安全等。|王让定（1962-），男，博士，宁波大学高等技术研究院教授、博士生导师，主要研究方向为多媒体通信与取证、信息隐藏与隐写分析、智能抄表及传感网络技术等。|严迪群（1979-），男，博士，宁波大学信息科学与工程学院副教授、硕士生导师，主要研究方向为多媒体通信、信息安全、基于深度学习的数字语音取证等。|陈亚楠（1990-），女，宁波大学信息科学与工程学院硕士生，主要研究方向为多媒体通信与信息安全等。
基金资助:
国家自然科学基金资助项目(61672302);国家自然科学基金资助项目(61300055);浙江省自然科学基金资助项目(LZ15F020010);浙江省自然科学基金资助项目(Y17F020051);宁波大学科研基金资助项目(XKXL1405);宁波大学科研基金资助项目(XKXL1420);宁波大学科研基金资助项目(XKXL1509);宁波大学科研基金资助项目(XKXL1503);宁波大学王宽诚幸福基金资助项目

Recapture voice replay detection based on phase spectrum

Can LI,Rangding WANG(),Diqun YAN,Yanan CHEN

College of Information Science and Engineering,Ningbo University,Ningbo 315211,China

Revised:2017-03-20 Online:2017-08-01 Published:2017-08-25
Supported by:
The National Natural Science Foundation of China(61672302);The National Natural Science Foundation of China(61300055);Natural Science Foundation of Zhejiang Province of China(LZ15F020010);Natural Science Foundation of Zhejiang Province of China(Y17F020051);The Scientific Research Foundation of Ningbo University(XKXL1405);The Scientific Research Foundation of Ningbo University(XKXL1420);The Scientific Research Foundation of Ningbo University(XKXL1509);The Scientific Research Foundation of Ningbo University(XKXL1503);K.C.Wong Magna Fund in Ningbo University

摘要/Abstract

摘要：

因与原始语音具有高度相似性，经高保真设备回放的翻录语音常被不法分子用于对说话人认证（ASV）系统进行攻击，以达到非法认证的目的。为提高系统抵抗翻录语音攻击的顽健性，通过研究原始语音与翻录语音产生的实际过程，发现两者在频率域相位上有明显差异，并在此基础上提出了一种基于相位谱的翻录语音检测方法。分析讨论了FFT和不同偷录、回放设备对翻录语音检测率的影响。实验结果表明，该方法能够准确地判断待测语音是否为翻录语音，其检测率达到了99.04%。并且，将该算法加载到说话人识别系统中，使系统的等错误概率（EER）降低了约22%，有效提高了系统抵抗翻录语音攻击的性能。

关键词: 说话人认证系统, 翻录语音检测, 相位谱

Abstract:

Due to a high similarity between the recaptured voice recorded by high-fidelity ripping equipment and the original voice,the automatic speaker verification（ASV）system used to be attacked illegally by the recaptured voice.In order to improve the ability of resisting the attack,a recaptured voice detection method was proposed based on the difference of phase spectrum between original and recaptured voices for the ASV system.In addition,the effects of different recording and replay devices,the FFT were discussed.Experimental results show that the proposed method can accurately recognize the recording voice,of which detection rate is 99.04%。Meanwhile,the equal error rate (EER) of the ASV system has dropped about 22% with this method being integrated,which indicates that the system’s ability of resisting playback attack is enhanced.

Key words: ASV system, recaptured voice detection, phase spectrum

中图分类号:

TP391

李璨,王让定,严迪群,陈亚楠. 基于相位谱的翻录语音攻击检测算法[J]. 电信科学, 2017, 33(8): 145-154.

Can LI,Rangding WANG,Diqun YAN,Yanan CHEN. Recapture voice replay detection based on phase spectrum[J]. Telecommunications Science, 2017, 33(8): 145-154.

图/表 16

图1

图2

图3

图4

图5

图6

图7

表1

表2

图8

表3

表4

表5

表6

图10

表7

参考文献 18

[1]	SHANG W , STEVENSON M . A playback attack detector for speaker verification systems[C]// 2008 IEEE International Symposium on Communications Control and Signal Processing (ISCCSP),March 12-14,2008,Bordeaux,France. New Jersey:IEEE Press, 2008: 1144-1149.
[2]	SHANG W , STEVENSON M . Score normalization in playback attack detection[C]// IEEE International Conference on Acoustics Speech and Signal Processing (ICASSP) ,March 14-19,2008,Dallas,USA. New Jersey:IEEE Press, 2010: 1678-1681.
[3]	JAKUB G , MARCIN G , RAFAL S . Playback attack detection for text-dependent speaker verification over telephone channels[J]. Speech Communication, 2015(67): 143-153.
[4]	WU Z , GAO S , CLING E S ,et al. A study on replay attack and anti-spoofing for text-dependent speaker verification[C]// IEEE 2014 Summit and Conference,Asia-Pacific Signal and Information Processing Association,December 9-12,2014,Siem Reap,Cambodia. New Jersey:IEEE Press, 2014: 35-45.
[5]	张利鹏, 曹犟, 徐明星 . 防止假冒者闯入说话人识别系统[J]. 清华大学学报(自然科学版), 2008,48(S1): 699-703.
	ZHANG L P , CAO J , XU M X . Prevention of impostors enter-ing speaker recognition systems[J]. Journal of Tsinghua univer-sity (Science and Technology ), 2008,48(S1): 699-703.
[6]	王志锋, 贺前华, 张雪源 ,等. 基于模式噪声的录音回放攻击检测[J]. 华南理工大学学报, 2011,39(10): 7-12.
	WANG Z F , HE Q H , ZHANG X Y ,et al. Channel pattern noise based playback detection algorithm speaker recognition[J]. Journal of South China University of Technology(Natural Science Edition), 2011,39(10): 7-12.
[7]	WANG Z F , HE Q H , ZHANG X Y ,et al. Channel pattern noise based playback detection algorithm speaker recognition[C]// IEEE International Conference on Machine Learning and Cybernetics(ICMLC),July 10-13,2011,Guilin,China. New Jersey:IEEE Press, 2011: 1708-1713.
[8]	VILLABA J , LLEIDA E . Detecting replay attacks from far-field recordings on speaker verification systems[C]// COST 2011 European Conference on Biometrics and ID Management,March 8-10,2011,Brandenburg,Germany. New York:ACM Press, 2011: 274-285.
[9]	VILLABA J , LLEIDA E . Preventing replay attacks on speaker verification systems[C]// IEEE International Carnahan Conference on Security Technology (ICCST),October 18-21,2011,San Francisco,USA. New Jersey:IEEE Press, 2011: 1-8.
[10]	CHEN Y N , WANG R D , YAN D Q ,et al. Voice playback detection based on long-window scale-factors[J]. International Journal of Security and Its Application, 2016,10(12): 299-310.
[11]	郑志彬 . 信息网络安全威胁及技术发展趋势[J]. 电信科学, 2009,25(2): 28-34.
	ZHENG Z B . Overview of mobile communication services se-curity[J]. Telecommunications Science, 2009,25(2): 28-34.
[12]	王帅, 汪来富, 金华敏 ,等. 网络安全分析中的大数据技术应用[J]. 电信科学, 2015,31(7): 145-150.
	WANG S , WANG L F , JIN H M ,et al. Big data application in network security analysis[J]. Telecommunications Science, 2015,31(7): 145-150.
[13]	OPPENHERIM A V , LIM J S . The important of phase in signals[J]. Processing of the IEEE, 1981,69(5): 529-541.
[14]	DUAN K B , RAJAPAKSE J C , WANG H Y ,et al. Multiple SVM-RFE for gene selection in cancer classification with expression data[J]. IEEE Transactions on Nano Bioscience, 2005,4(3): 228-234.
[15]	王天庆, 李爱军. ,等.连续汉语语音识别语料库的设计[C]// 第六届全国现代语音学学术会议论文集(下),2003年10月18-20日,天津,中国. 天津:天津人民出版社, 2003.
	WANG T Q , LI A J . The design of the continuous Chinese speech recognition corpus[C]// The sixth national conference on modern phonetics learning,October 18-20,2003,Tianjin,China,Tianjin:Tianjin Remin Chubanshe, 2003.
[16]	杨震, 徐敏捷, 刘璋峰 ,等. 语音大数据信息处理架构及关键技术研究[J]. 电信科学, 2013,29(11): 1-5.
	YANG Z , XU M J , LIU Z F ,et al. Study of audio frequency big data processing architecture and key technology[J]. Telecommunications Science, 2013,29(11): 1-5.
[17]	CHAKROBORTY S , ROY A , SAHA G . Improved closed set text-independent speaker identification by combining MFCC with evidence from flipped filter banks[J]. International Journal of Signal Processing, 2007,4(2): 114-122.
[18]	KANAGASUNDARAM A , DEANA D , SRIDHARAN S ,et al. I-vector based speaker recognition using advanced channel compensation techniques[J]. Computer Speech and Language, 2014,28(1): 121-140.

RANK	特征索引N	AVG_RANK	RANK	特征索引N	AVG_RANK
1	509	1	11	132	22.2
2	354	2.4	12	508	23.4
3	3	6.2	13	353	24
4	21	12	14	505	24.4
5	507	13.6	15	200	30.4
6	355	15.4	16	84	30.6
7	186	16.8	17	506	33
8	501	18	18	258	35.2
9	512	18.8	19	498	35.6
10	2	22.2	20	196	38

	采集设备	偷录设备			回放设备
类型	Aigo R6620	iPhone6	Mi4	Sony PX440	Huawei AM08	Philips DTM3115	Yamaha TSX-140
语音格式	wav	m4a	mp3	mp3	—	—	—
参数	16 kHz	44.1 kHz	44.1 kHz	44.1 kHz	—	—	—
	16 bit/s	64 kbit/s	128 kbit/s	192 kbit/s

语音类型	回放设备	偷录设备	样本数量/个
原始语音	—	—	1 400
翻录语音	Huawei AM08	iPhone6	1 400
		Mi4	1 400
		Sony PX440	1 400
	Philips DTM3115	iPhone6	1 400
		Mi4	1 400
		Sony PX440	1 400
	Yamaha TSX-140	iPhone6	1 400
		Mi4	1 400
		Sony PX440	1 400

FFT采样点数	LIBSVM
FFT采样点数	TPR	FPR	ACC	召回率	精密率	F值
128	92.20%	60.30%	92.21%	91.20%	92.20%	90.80%
256	95.80%	28.60%	95.78%	95.60%	95.80%	95.60%
512	98.10%	14.80%	98.14%	98.10%	98.10%	98.10%
1 024	99.00%	6.66%	99.04%	99.00%	99.00%	99.00%

回放设备	偷录设备
	iPhone6			Mi4			Sony PX440
	TPR	FPR	ACC	TPR	FPR	ACC	TPR	FPR	ACC
Huawei AM08	100%	0	100%	100%	0	100%	100%	0	100%
Philips DTM3115	99.80%	0.20%	99.78%	99.92%	0.10%	99.92%	99.80%	0.20%	99.78%
Yamaha TSX-140	100%	0	100%	99.71%	0.30%	99.71%	100%	0	100%

基于相位谱的翻录语音攻击检测算法

Recapture voice replay detection based on phase spectrum

在线阅读

PDF下载

可视化

被引次数

摘要/Abstract

引用本文

使用本文

图/表 16

参考文献 18

相关文章 15

Metrics

推荐阅读 0

算法	ACC	EER
算法	ACC	GMM-UBM	i-vector
本文算法	99.04%	6.48%	5.56%
参考文献[5]算法	73.70%	26.67%	18.51%
参考文献[7]算法	78.28%	23.33%	17.50%
参考文献[10]算法	98.91%	7.03%	6.40%

[1]	金宏辉, 简志华, 杨曼, 吴超. 采用圆周局部三值模式纹理特征的合成语音检测方法[J]. 电信科学, 2023, 39(6): 85-95.
[2]	马辉, 王瑞琴, 杨帅. 一种渐进式增长条件生成对抗网络模型[J]. 电信科学, 2023, 39(6): 105-113.
[3]	卢敏, 胡娟, 张先超, 丁伟健, 乐光学. 基于用户多特征融合的个性化推荐模型[J]. 电信科学, 2023, 39(5): 101-115.
[4]	张永, 刘纪奎, 柯文龙. 基于并行可分离卷积和标签平滑正则化的脑电情感识别[J]. 电信科学, 2023, 39(5): 116-128.
[5]	邓琨, 蒋庆丰, 刘星妍. 融合节点分析与边分析的复杂网络社区识别算法[J]. 电信科学, 2023, 39(4): 87-100.
[6]	冶莉娟, 王亦婷, 朱励程. 基于细胞自动机模型电力网络攻击预测技术[J]. 电信科学, 2023, 39(4): 173-179.
[7]	韩一士, 徐雨欣, 卢甜甜. 一种基于耦合网络的RD-IHSAT网络谣言传播模型[J]. 电信科学, 2023, 39(2): 118-131.
[8]	徐嘉, 简志华, 金宏辉, 吴超, 游林, 吴迎笑. 基于中心对称局部二值模式的合成伪装语音检测方法[J]. 电信科学, 2023, 39(1): 72-78.
[9]	任华健, 郝秀兰, 徐稳静. 融合递增词汇选择的深度学习中文输入法[J]. 电信科学, 2022, 38(12): 56-64.
[10]	周薇娜, 刘露. 复杂场景下多尺度船舶实时检测方法[J]. 电信科学, 2022, 38(10): 67-78.
[11]	金楠, 王瑞琴, 陆悦聪. 基于艾宾浩斯遗忘曲线和注意力机制的推荐算法[J]. 电信科学, 2022, 38(10): 89-97.
[12]	杨帅, 王瑞琴, 马辉. 基于多通道的边学习图卷积网络[J]. 电信科学, 2022, 38(9): 95-104.
[13]	赵东明. 电信运营商知识图谱技术体系研究及应用实践[J]. 电信科学, 2022, 38(8): 151-162.
[14]	于佳祺, 简志华, 徐嘉, 游林, 汪云路, 吴超. 基于联合特征与随机森林的伪装语音检测[J]. 电信科学, 2022, 38(6): 91-99.
[15]	申情, 郭文宾, 楼俊钢, 余强国. 考虑多层次潜在特征的个性化推荐模型[J]. 电信科学, 2022, 38(2): 71-83.

判决1	判决2	结果
0	0	0
0	1	0
1	0	0
1	1	1