[1] |
郑方, 李蓝天, 张慧 ,等. 声纹识别技术及其应用现状[J]. 信息安全研究, 2016,2(1): 44-57.
|
|
ZHENG F , LI L T , ZHANG H ,et al. Overview of voiceprint recognition technology and applications[J]. Journal of Information Security Research, 2016,2(1): 44-57.
|
[2] |
张钹, 朱军, 苏航 . 迈向第三代人工智能[J]. 中国科学:信息科学, 2020,50(9): 1281-1302.
|
|
ZHANG B , ZHU J , SU H . Toward the third generation of artificial intelligence[J]. Scientia Sinica (Informationis), 2020,50(9): 1281-1302.
|
[3] |
DEHAK N , KENNY P J , DEHAK R ,et al. Front-end factor analysis for speaker verification[J]. IEEE Transactions on Audio,Speech,and Language Processing, 2011,19(4): 788-798.
|
[4] |
VESTMAN V , KINNUNEN T . Supervector compression strategies to speed up I-vector system development[C]// Odyssey 2018 The Speaker and Language Recognition Workshop.[S.n.:s.l.], 2018: 357-364.
|
[5] |
MA J B , SETHU V , AMBIKAIRAJAH E ,et al. Generalized variability model for speaker verification[J]. IEEE Signal Processing Letters, 2018,25(12): 1775-1779.
|
[6] |
CHEN C , HAN J Q . TDMF:task-driven multilevel framework for end-to-end speaker verification[C]// 2020 IEEE International Confe rence on Acoustics,Speech and Signal Processing. Piscataway:IEEE Press, 2020: 6809-6813.
|
[7] |
高荣春, 韩纪庆, 张磊 . 说话人识别中基于最大后验概率的通道补偿方法[J]. 通信学报, 2009,30(3): 99-103.
|
|
GAO R C , HAN J Q , ZHANG L . Channel compensation of speaker identification based on maximum a posteriori[J]. Journal on Communications, 2009,30(3): 99-103.
|
[8] |
汪海彬, 郭剑毅, 毛存礼 ,等. 基于通用背景-联合估计(UB-JE)的说话人识别方法[J]. 自动化学报, 2018,44(10): 1888-1895.
|
|
WANG H B , GUO J Y , MAO C L ,et al. Speaker recognition based on universal background-joint estimation(UB-JE)[J]. Acta Automatica Sinica, 2018,44(10): 1888-1895.
|
[9] |
VARIANI E , LEI X , MCDERMOTT E ,et al. Deep neural networks for small footprint text-dependent speaker verification[C]// 2014 IEEE International Conference on Acoustics,Speech and Signal Processing. Piscataway:IEEE Press, 2014: 4052-4056.
|
[10] |
SNYDER D , GARCIA-ROMERO D ,, POVEY D , et al . Deep neural network embeddings for text-independent speaker verification[C]// Interspeech 2017. Piscataway:IEEE Press, 2017: 999-1003.
|
[11] |
SNYDER D , GARCIA-ROMERO D ,, SELL G , et al . X-vectors:robust DNN embeddings for speaker recognition[C]// 2018 IEEE International Conference on Acoustics,Speech and Signal Processing. Piscataway:IEEE Press, 2018: 5329-5333.
|
[12] |
LECUN Y , BOSER B , DENKER J S ,et al. Backpropagation applied to handwritten zip code recognition[J]. Neural Computation, 1989,1(4): 541-551.
|
[13] |
VILLALBA J , CHEN N , SNYDER D ,et al. State-of-the-art speaker recognition for telephone and video speech[C]// Proceeding of the Twenty Annual Conference of the International Speech Communication Association. Piscataway:IEEE Press, 2019: 1488-1492.
|
[14] |
SNYDER D , GARCIA-ROMERO D ,, SELL G , et al . Speaker recognition for multi-speaker conversations using X-vectors[C]// 2019 IEEE International Conference on Acoustics,Speech and Signal Processing. Piscataway:IEEE Press, 2019: 5796-5800.
|
[15] |
ZHANG R T , WEI J G , LU W H ,et al. ARET:aggregated residual extended time-delay neural networks for speaker verification[C]// Interspeech 2020. Piscataway:IEEE Press, 2020: 946-950.
|
[16] |
YU Y Q , LI W J . Densely connected time delay neural network for speaker verification[C]// Interspeech 2020. Piscataway:IEEE Press, 2020: 921-925.
|
[17] |
NAGRANI A , CHUNG J S , ZISSERMAN A . VoxCeleb:a large-scale speaker identification dataset[C]// Interspeech 2017. Piscataway:IEEE Press, 2017: 2616-2620.
|
[18] |
BHATTACHARYA G , ALAM M J , GUPTA V ,et al. Deeply fused speaker embeddings for text-independent speaker verification[C]// Interspeech 2018. Piscataway:IEEE Press, 2018: 3588-3592.
|
[19] |
ZHANG C L , KOISHIDA K , HANSEN J H L . Text-independent speaker verification based on triplet convolutional neural network embeddings[J]. IEEE/ACM Transactions on Audio,Speech,and Language Processing, 2018,26(9): 1633-1644.
|
[20] |
陈莹, 陈湟康 . 基于多模态生成对抗网络和三元组损失的说话人识别[J]. 电子与信息学报, 2020,42(2): 379-385.
|
|
CHEN Y , CHEN H K . Speaker recognition based on multimodal generative adversarial nets with triplet-loss[J]. Journal of Electronics &Information Technology, 2020,42(2): 379-385.
|
[21] |
HUANG Z L , WANG S , YU K . Angular softmax for short-duration text-independent speaker verification[C]// Interspeech 2018. Piscataway:IEEE Press, 2018: 3623-3627.
|
[22] |
NOVOSELOV S , SHULIPA A , KREMNEV I ,et al. On deep speaker embeddings for text-independent speaker recognition[C]// Odyssey 2018 The Speaker and Language Recognition Workshop. Piscataway:IEEE Press, 2018: 378-385.
|
[23] |
YU Y Q , FAN L , LI W J . Ensemble additive margin softmax for speaker verification[C]// 2019 IEEE International Conference on Acoustics,Speech and Signal Processing. Piscataway:IEEE Press, 2019: 6046-6050.
|
[24] |
WEI Y H , DU J Z , LIU H . Angular margin centroid loss for text-independent speaker recognition[C]// Interspeech 2020. Piscataway:IEEE Press, 2020: 3820-3824.
|
[25] |
KULLBACK S , LEIBLER R A . On information and sufficiency[J]. The Annals of Mathematical Statistics, 1951,22(1): 79-86.
|
[26] |
BELGHAZI M.I , BARATIN A , RAJESHWAR S ,et al. Mutual information neural estimation[C]// Proceeding of the Thirty-Fifth International Conference on Machine Learning. Piscataway:IEEE Press, 2018: 531-540.
|
[27] |
REYNOLDS D A , QUATIERI T F , DUNN R B . Speaker verification using adapted Gaussian mixture models[J]. Digital Signal Processing, 2000,10(1/2/3): 19-41.
|
[28] |
龙华, 杨明亮, 邵玉斌 . 基于特征流融合的带噪语音检测算法[J]. 通信学报, 2020,41(4): 134-142.
|
|
LONG H , YANG M L , SHAO Y B . Noisy voice detection algorithm based on feature stream fusion[J]. Journal on Communications, 2020,41(4): 134-142.
|
[29] |
MAATEN L , HINTON G . Visualizing data using t-SNE[J]. Journal of Machine Learning Research, 2008,9(11): 2579-2605.
|