基于逆梅尔对数频谱系数的回放语音检测算法

doi:10.11959/j.issn.1000-0801.2018020

电信科学 ›› 2018, Vol. 34 ›› Issue (5): 90-98.doi: 10.11959/j.issn.1000-0801.2018020

基于逆梅尔对数频谱系数的回放语音检测算法

林朗,王让定,严迪群,李璨

宁波大学，浙江宁波 315211

修回日期:2017-12-07 出版日期:2018-05-01 发布日期:2018-05-30
作者简介:林朗（1994-），男，宁波大学信息科学与工程学院硕士生，主要研究方向为多媒体通信与信息安全等。|王让定（1962-），男，博士，宁波大学信息科学与工程学院教授、博士生导师，主要研究方向为多媒体通信与取证、信息隐藏与隐写分析、智能抄表及传感网络技术等。|严迪群（1979-），男，博士，宁波大学信息科学与工程学院副教授、硕士生导师，主要研究方向为多媒体通信、信息安全、基于深度学习的数字语音取证等。|李璨（1992-），女，宁波大学信息科学与工程学院硕士生，主要研究方向为多媒体通信与信息安全等。
基金资助:
国家自然科学基金资助项目(61672302);国家自然科学基金资助项目(61300055);浙江省自然科学基金资助项目(LZ15F020002);浙江省自然科学基金资助项目(LY17F020010);宁波大学科研基金资助项目(XKXL1405);宁波大学科研基金资助项目(XKXL1420);宁波大学科研基金资助项目(XKXL1509);宁波大学科研基金资助项目(XKXL1503);宁波大学王宽诚幸福基金资助项目

A playback speech detection algorithm based on log inverse Mel-frequency spectral coefficient

Lang LIN,Rangding WANG,Diqun YAN,Can LI

Ningbo University,Ningbo 315211,China

Revised:2017-12-07 Online:2018-05-01 Published:2018-05-30
Supported by:
The National Natural Science Foundation of China(61672302);The National Natural Science Foundation of China(61300055);The Natural Science Foundation of Zhejiang Province of China(LZ15F020002);The Natural Science Foundation of Zhejiang Province of China(LY17F020010);The Scientific Research Foundation of Ningbo University(XKXL1405);The Scientific Research Foundation of Ningbo University(XKXL1420);The Scientific Research Foundation of Ningbo University(XKXL1509);The Scientific Research Foundation of Ningbo University(XKXL1503);K.C.Wong Magna Fund in Ningbo University

摘要/Abstract

摘要：

高保真录音设备和回放设备的普及化及便携化，给说话人识别系统的抗回放语音攻击带来了严峻挑战。通过语谱图分析原始语音和回放语音在高频区的差异，有针对性地将语音信号在求取 Mel（梅尔）倒谱系数过程中的Mel滤波器组逆置，并将DCT前的Mel对数频谱系数作为算法的特征。最后，利用支持向量机作为分类器对待测语音进行判别。实验结果表明，此算法能够有效地检测回放语音。另外，将此算法加载到GMM-UBM说话人识别系统后，显著地提升了系统的抗回放语音攻击能力。

关键词: 说话人识别, 回放语音检测, 梅尔对数频谱, 逆梅尔滤波器组

Abstract:

The popularity and portability of high-fidelity audio recording equipment and playback equipment poses a serious challenge for speaker recognition systems against playback attacks.Based on the differences between the original speech and the playback speech in high frequency region,the algorithm reversed the Mel-filter bank in Mel-frequency cepstral coefficient (MFCC) calculation,and the coefficients before the DCT were used as the features of the algorithm.SVM was utilized as the classifier.Experimental results show that this algorithm can effectively detect the playback speech.In addition,the algorithm is integrated into the GMM-UBM speaker recognition system,which significantly improves the systems’ capability of resisting the playback attack.

Key words: speaker recognition, playback speech detection, log Mel-frequency spectrum, inverse Mel-filter group

中图分类号:

TN912.3

林朗,王让定,严迪群,李璨. 基于逆梅尔对数频谱系数的回放语音检测算法[J]. 电信科学, 2018, 34(5): 90-98.

Lang LIN,Rangding WANG,Diqun YAN,Can LI. A playback speech detection algorithm based on log inverse Mel-frequency spectral coefficient[J]. Telecommunications Science, 2018, 34(5): 90-98.

图/表 16

图1

图2

图3

图4

图5

图6

图7

图8

表1

表2

表3

表4

表5

图9

表6

图10

参考文献 17

[1]	ZHU D , MA B , LI H . Speaker verification with feature-space MAPLR parameters[J]. IEEE Transactions on Audio Speech ＆Language Processing, 2011,19(3): 505-515.
[2]	易克初, 胡征 . 一种应用矢量量化的语音合成新方法[J]. 电信科学, 1987(11): 1-6.
	YI K C , HU Z . A new speech synthesis method using vector quantization[J]. Telecommunications Science, 1987(11): 1-6.
[3]	郭弘 . 录音证据的真实性检验与研究[J]. 电信科学, 2010,26(Z2): 56-60.
	GUO H . Authenticity verification and research of recording evidence[J]. Telecommunications Science, 2010,26(Z2): 56-60.
[4]	李璨, 王让定, 严迪群 ,等. 基于相位谱的翻录语音攻击检测算法[J]. 电信科学, 2017,33(8): 145-154.
	LI C , WANG R D , YAN D Q ,et al. Detection algorithm of riprap voice attack based on phase spectrum[J]. Telecommunications Science, 2017,33(8): 145-154.
[5]	SHANG W , STEVENSON M . A playback attack detector for speaker verification systems[C]// IEEE International Symposium on Communications Control and Signal Processing (ISCCSP),March 12-14,2008,St Julians,Malta. Piscataway:IEEE Press, 2008: 1144-1149.
[6]	SHANG W , STEVENSON M . Score normalization in playback attack detection[C]// IEEE International Conference on Acoustics Speech and Signal Processing (ICASSP),March 14-19,2010,Dallas,USA. Piscataway:IEEE Press, 2010: 1678-1681.
[7]	张利鹏, 曹犟, 徐明星 . 防止假冒者闯入说话人识别系统[J]. 清华大学学报(自然科学版), 2008,48(S1): 699-703.
	ZHANG L P , CAO J , XU M X . Prevention of impostors entering speaker recognition systems[J]. Journal of Tsinghua University (Science and Technology), 2008,48(S1): 699-703.
[8]	王志峰, 贺前华, 张雪源 ,等. 基于模式噪声的录音回放攻击检测[J]. 华南理工大学学报, 2011,39(10): 7-12.
	WANG Z F , HE Q H , ZHANG X Y ,et al. Channel pattern noise based playback detection algorithm speaker recognition[J]. Journal of South China University of Technology (Natural Science Edition), 2011,39(10): 7-12.
[9]	李富强, 万红, 黄俊杰 . 基于MATLAB的语谱图显示与分析[J]. 微计算机信息, 2005(20): 172-174.
	LI F Q , WAN H , HUANG J J . The display and analysis of sonogram based on MATLAB[J]. Control ＆ Automation, 2005(20): 172-174.
[10]	BURILLO P , BUSTINCE H . Entropy on intuitionistic fuzzy sets and on interval-valued fuzzy sets[J]. Fuzzy Sets ＆ Systems, 1996,78(3): 305-316.
[11]	项要杰, 杨俊安, 李晋徽 ,等. 一种适用于说话人识别的改进Mel滤波器[J]. 计算机工程, 2013(11): 214-217.
	XIANG Y J , YANG J A , LI J H ,et al. An improved Mel-frequency filter for speaker recognition[J]. Computer Engineering, 2013(11): 214-217.
[12]	陶佰睿, 郭琴, 苗凤娟 ,等. 基于改进 Mel 滤波器组的声纹特征提取SoC设计[J]. 微电子学, 2015(6): 785-788.
	TAO B R , GUO Q , MIAO F J ,et al. SoC design of voiceprint features extraction based on improved Mel filter banks[J]. Microelectronics, 2015(6): 785-788.
[13]	胡永刚, 吴翊, 王洪志 ,等. 高维数据降维的 DCT 变换[J]. 计算机工程与应用, 2006(32): 21-23.
	HU Y G , WU Y , WANG H Z ,et al. Discrete cosine transform in data dimensionality reduction[J]. Computer Engineering and Applications, 2006(32): 21-23.
[14]	MOHAMED A . Deep neural network acoustic models for ASR[J]. Doctoral, 2014
[15]	CHANG C C , LIN C J . LIBSVM:a library for support vector machines[J]. ACM Transactions on Intelligent Systems ＆Technology, 2012,2(3): 1-27.
[16]	王天庆, 李爱军 . 连续汉语语音识别语料库的设计[C]// 第六届全国现代语音学学术会议论文集,2003年10月1日,天津,中国. [出版地不详:出版者不详], 2003: 1-4.
	WANG T Q , LI A J . The design of the continuous Chinese speech recognition corpus[C]// The Sixth National Conference on Modern Phonetics Learning,Oct 1,2003,Tianjin,China.[S.l.:s.n]. 2003: 1-4.
[17]	CHAKROBORTY S , ROY A , SAHA G . Improved closed setttext-independent speaker identification by combining MFCC with evidence from flipped filter banks[J]. International Journal of Signal Processing, 2007,4(2): 114-122.

类别	原始录制设备	偷录设备			回放设备
类别	Aigo R6620	iPhone6	Mi4	Sony PX440	Huawei AM08	Philips DTM3115
语音格式	wav	m4a	mp3	mp3	—	—
参数	16 kHz	44.1 kHz	44.1 kHz	44.1 kHz	—	—
	16 bit/s	64 kbit/s	128 kbit/s	192 kbit/s

语音	原始录制设备	回放设备	偷录设备	样本数/个
原始语音	Aigo R6620	—	—	2 400
回放语音	Aigo R6620	Huawei AM08	iPhone6、Mi4、Sony PX440	6 300
		Philips DTM3115	iPhone6、Mi4、Sony PX440	6 300

特征	Philips DTM3115			Huawei AM08			两种设备的交叉
特征	FPR	TPR	ACC	FPR	TPR	ACC	FPR	TPR	ACC
MFCC	99.60%	1.30%	99.58%	96.90%	7.00%	96.92%	96.70%	16.90%	96.67%
I-MFCC	99.90%	0.20%	99.92%	98.2%	3.70%	98.16%	97.30%	14.00%	97.29%
MFSC	100%	0	100%	99.30%	0.20%	99.33%	99.70%	0.30%	99.67%
I-MFSC	100%	0	100%	100%	0	100%	99.90%	0.90%	99.86%

	回放设备	偷录设备	测试集
			Huawei AM08			Philips DTM3115
			iPhone	Mi	Sony	iPhone	Mi	Sony
训练集	Huawei AM08	iPhone	100%	100%	100%	100%	100%	99.86%
		Mi	98.43%	100%	96.79%	97.86%	100%	82%
		Sony	100%	100%	100%	99.07%	99.50%	99.79%
	Philips DTM3115	iPhone	99.85%	100%	92.79%	100%	100%	99.85%
		Mi	96.70%	99%	77.29%	99.93%	100%	72.79%
		Sony	100%	99.14%	100%	100%	97.04%	100%

算法	纯净条件	30 dB噪声	25 dB噪声	20 dB噪声	15 dB噪声
MFCC	96.67%	95.66%	90.89%	87.74%	85.71%
I-MFCC	97.29%	96.57%	95.52%	90.89%	88.81%
MFSC	99.67%	98.62%	98.23%	97.57%	96.57%
I-MFSC	99.86%	99.35%	98.95%	98.21%	97.43%

基于逆梅尔对数频谱系数的回放语音检测算法

A playback speech detection algorithm based on log inverse Mel-frequency spectral coefficient

在线阅读

PDF下载

可视化

摘要/Abstract

引用本文

使用本文

图/表 16

参考文献 17

相关文章 1

Metrics

推荐阅读 0

算法	ACC	EER
参考文献[4]算法	75.42%	25.45%
参考文献[5]算法	83.23%	19.09%
本文算法	99.86%	5.90%