基于LDOF准则的自适应高斯后端语种识别方法

doi:10.11959/j.issn.1000-436x.2017096

通信学报 ›› 2017, Vol. 38 ›› Issue (4): 17-24.doi: 10.11959/j.issn.1000-436x.2017096

基于LDOF准则的自适应高斯后端语种识别方法

叶中付^1,^2,³,戚婷^1,²,李赛峰^1,²,宋彦^1,²

¹ 中国科学技术大学信息科学技术学院，安徽合肥 230027
² 中国科学技术大学语音及语言信息处理国家工程实验室，安徽合肥 230027
³ 数学工程与先进计算国家重点实验室，江苏无锡 214125

修回日期:2017-02-09 出版日期:2017-04-01 发布日期:2017-07-20
作者简介:叶中付（1959-），男，安徽桐城人，博士，中国科学技术大学教授、博士生导师，主要研究方向为语音信号处理、阵列信号处理、雷达信号处理和图像分析与处理。|戚婷（1993-），女，安徽淮南人，中国科学技术大学硕士生，主要研究方向为语种识别。|李赛峰（1980-），男，江西萍乡人，中国科学技术大学博士生，主要研究方向为通信信号处理和语音信号处理。|宋彦（1972-），男，安徽合肥人，博士，中国科学技术大学副教授，主要研究方向为语种识别和基于内容的音/视频分析与检索。
基金资助:
数学工程与先进计算国家重点实验室开放基金资助项目(2015A15)

Adaptive Gaussian back-end based on LDOF criterion for language recognition

Zhong-fu YE^1,^2,³,Ting QI^1,²,Sai-feng LI^1,²,Yan SONG^1,²

¹ School of Information Science and Technology,University of Science and Technology of China,Hefei 230027,China
² National Engineering Laboratory for Speech and Language Information Processing,University of Science and Technology of China,Hefei 230027,China
³ State Key Laboratory of Mathematical Engineering and Advanced Computing,Wuxi 214125,China

Revised:2017-02-09 Online:2017-04-01 Published:2017-07-20
Supported by:
The Open Project Program of the State Key Laboratory of Mathematical Engineering and Advanced Computing(2015A15)

摘要/Abstract

摘要：

针对由语种类内多样性引起的测试样本和训练模型不匹配的问题，提出一种基于局部距离离群因子准则（LDOF,local distance-based outlier factor）的自适应高斯后端语种识别方法。定义LDOF准则，实现有效的参数寻优过程并动态地在多类语种训练集上挑选出与测试样本特性相近的训练样本，调整原高斯后端，进而得到改进的语种识别方法。在NIST LRE 2009的6个易混淆语种任务集上的实验结果表明，所提方法的等错误概率（EER,equal error rate）和平均检测代价有显著提升。

关键词: 语种识别, 类内多样性, 自适应高斯后端, LDOF

Abstract:

In order to alleviate the mismatch in model between training and testing samples caused by inter-language variations,adaptive Gaussian back-end based on LDOF criterion was proposed for language recognition.The local distance-based outlier factor (LDOF) criterion was defined to find the appropriate model parameters and dynamically select the training data subset similar to the testing samples from multiple class training sets.Then original back-end was adjusted to obtain a more matched recognition model.Experimental results on NIST LRE 2009 easily-confused language data set show that proposed method achieves an obvious performance improvement on both the equal error rate (ERR) and average decision cost function.

Key words: language recognition, inter-language variations, adaptive Gaussian back-end, LDOF

中图分类号:

TN912.34

叶中付,戚婷,李赛峰,宋彦. 基于LDOF准则的自适应高斯后端语种识别方法[J]. 通信学报, 2017, 38(4): 17-24.

Zhong-fu YE,Ting QI,Sai-feng LI,Yan SONG. Adaptive Gaussian back-end based on LDOF criterion for language recognition[J]. Journal on Communications, 2017, 38(4): 17-24.

图/表 8

图1

图2

图3

图4

表1

表2

图5

图6

参考文献 21

[1]	蒋兵 . 语种识别深度学习方法研究[D]. 合肥:中国科学技术大学, 2015.
	JIAN B . Deep learning based spoken language identification[D]. Hefei:University of Science and Technology of China, 2015.
[2]	DEHAK N , KENNY P , DEHAK R ,et al. Front-end factor analysis for speaker verification[J]. IEEE Transactions on Audio,Speech,and Language Processing, 2011,19(4): 788-798.
[3]	DEHAK N , TORRES-CARRASQUILLO P A , REYNOLDS D A , et al ,et al. Language recognition via i-vectors and dimensionality reduction[C]// The 12th Annual Conference of the International Speech Communication Association (Interspeech). 2011: 857-860.
[4]	MARTINEZ D , PLCHOT O , BURGET L ,et al. Language recognition in iVectors space[C]// The Interspeech 2011,Conference of the International Speech Communication Association. 2011: 861-864.
[5]	PENAGARIKANO M , VARONA A , DIEZ M ,et al. Study of different backends in a state-of-the-art language recognition system[C]// Interspeech. 2012: 2049-2052.
[6]	杨绪魁, 屈丹, 张文林 . 正交拉普拉斯语种识别方法[J]. 自动化学报, 2014,40(8): 1812-1818.
	YANG X K , QU D , ZHANG W L . An orthogonal laplacian language recognition approach[J]. Acta Automatica Sinica, 2014,40(8): 1812-1818.
[7]	LIU G , HASAN T , BORIL H ,et al. An investigation on back-end for speaker recognition in multi-session enrollment[C]// 2013 IEEE International Conference on Acoustics,Speech and Signal Processing (ICASSP),IEEE, 2013: 7755-7759.
[8]	VAN L D A , BRUMMER N . Channel-dependent GMM and multi-class logistic regression models for language recognition[C]// 2006 IEEE Odyssey-The Speaker and Language Recognition Workshop.IEEE. 2006: 1-8.
[9]	BENZ M F , GAUVAIN J L , LAMEL L . Language score calibration using adapted Gaussian back-end[C]// Interspeech 2009. 2009: 2191-2194.
[10]	SENOUSSAOUI M , KENNY P,BRüMMER N , et al . Mixture of PLDA models in i-vector space for gender-independent speaker recognition[C]// Interspeech. 2011: 25-28.
[11]	KANAGASUNDARAM A , VOGT R J , DEAN D B ,et al. PLDA based speaker recognition on short utterances[C]// The Speaker and Language Recognition Workshop (Odyssey 2012). ISCA, 2012.
[12]	SARKAR A K , MATROUF D , BOUSQUET P M ,et al. Study of the effect of i-vector modeling on short and mismatch utterance duration for speaker verification[C]// Interspeech. 2012: 2662-2665.
[13]	WANG M G , SONG Y , JIANG B ,et al. Exemplar based language recognition method for short-duration speech segments[C]// 2013 IEEE International Conference on Acoustics,Speech and Signal Processing. IEEE, 2013: 7354-7358.
[14]	SONG Y , HONG X , JIANG B ,et al. Deep bottleneck network based i-vector representation for language identification[C]. Interspeech 2015. 2015: 398-402.
[15]	洪新海, 宋彦, 蒋兵 ,等. 采用 DBN 的 TV 改进方法在语种识别中的应用[J]. 信号处理, 2015,31(9): 1152-1158.
	HONG X H , SONG Y , JIANG B ,et al. Improved total variability modeling method using deep bottleneck network for language identification[J]. Journal of Signal Processing, 2015,31(9): 1152-1158.
[16]	王梦鸽 . 短时语种识别若干问题研究[D]. 合肥:中国科学技术大学, 2014.
	WANG M G . Research on problems in spoken language identification with short-duration segments[D]. Hefei:University of Science and Technology of China, 2014.
[17]	ZHANG K , HUTTER M , JIN H . A new local distance-based outlier detection approach for scattered real-world data[M]// Advances in Knowledge Discovery and Data Mining. Springer Berlin Heidelberg, 2009: 813-822.
[18]	BISWAS S , ROHDIN J , SHINODA K . I-vector selection for effective PLDA modeling in speaker recognition[C]// Proceedings Odyssey 2014-The Speaker and Language Recognition Workshop. 2014: 100-105.
[19]	VAN DER M L , HINTON G . Visualizing data using t-SNE[J]. Journal of Machine Learning Research, 2008,9(2605): 2579-2605.
[20]	MARTIN A F , PRZYBOCKI M A . NIST 2003 language recognition evaluation[C]// Interspeech. 2003.
[21]	MARTIN A F , GREENBERG C S . The 2009 NIST language recognition evaluation[C]// Odyssey. 2010:30.

方法		EER性能
方法	30 s	10 s	3s
GB	6.07%	8.08%	15.23%
CDS	5.84%	8.02%	15.27%
KNN-AGB(k=600)	5.61%	8.01%	14.46%
LDOF-AGB	5.01%	7.19%	14.42%
G-LDOF-AGB	4.87%	7.12%	14.38%

方法		Cavg性能
方法	30 s	10 s	3s
GB	183%	248%	475%
CDS	180%	244%	477%
KNN-AGB(k=600)	176%	240%	471%
LDOF-AGB	157%	215%	476%
G-LDOF-AGB	154%	217%	468%

基于LDOF准则的自适应高斯后端语种识别方法

Adaptive Gaussian back-end based on LDOF criterion for language recognition

在线阅读

PDF下载

可视化

摘要/Abstract

引用本文

使用本文

图/表 8

参考文献 21

相关文章 1

Metrics

推荐阅读 0