通信学报 ›› 2021, Vol. 42 ›› Issue (7): 231-237.doi: 10.11959/j.issn.1000-436x.2021133

• 学术通信 • 上一篇    

基于深层信息散度最大化的说话人确认方法

陈晨1,2, 肜娅峰1, 季超群1, 陈德运1,2, 何勇军1   

  1. 1 哈尔滨理工大学计算机科学与技术学院,黑龙江 哈尔滨 150080
    2 哈尔滨理工大学计算机科学与技术博士后流动站,黑龙江 哈尔滨 150080
  • 修回日期:2021-06-15 出版日期:2021-07-25 发布日期:2021-07-01
  • 作者简介:陈晨(1990− ),女,黑龙江哈尔滨人,博士,哈尔滨理工大学讲师、硕士生导师,主要研究方向为语音信号处理、音频信息分析、说话人识别等
    肜娅峰(1997− ),女,河南南阳人,哈尔滨理工大学硕士生,主要研究方向为说话人识别、语音信号处理等
    季超群(1995− ),男,黑龙江绥化人,哈尔滨理工大学硕士生,主要研究方向为说话人识别、语音信号处理等
    陈德运(1962− ),男,黑龙江哈尔滨人,博士,哈尔滨理工大学教授、博士生导师,主要研究方向为模式识别、机器学习等
    何勇军(1980− ),男,四川南充人,博士,哈尔滨理工大学教授、博士生导师,主要研究方向为语音信号处理、图像处理等
  • 基金资助:
    国家自然科学基金资助项目(61673142);黑龙江省自然科学基金资助项目(JJ2019JQ0013);黑龙江省博士后专项基金资助项目(LBH-Z20020);黑龙江省普通高校基本科研业务费专项资金资助项目(2020-KYYWF-0341)

Speaker verification method based on deep information divergence maximization

Chen CHEN1,2, Yafeng RONG1, Chaoqun JI1, Deyun CHEN1,2, Yongjun HE1   

  1. 1 School of Computer Science and Technology, Harbin University of Science and Technology, Harbin 150080, China
    2 Postdoctoral Research Station of Computer Science and Technology, Harbin University of Science and Technology, Harbin 150080, China
  • Revised:2021-06-15 Online:2021-07-25 Published:2021-07-01
  • Supported by:
    The National Natural Science Foundation of China(61673142);The Natural Science Foundation of Heilongjiang Province(JJ2019JQ0013);Heilongjiang Postdoctoral Fund(LBH-Z20020);The Fundamental Research Founds for the Central Universities of Heilongjiang Province(2020-KYYWF-0341)

摘要:

针对说话人确认中无法准确捕获特征间非线性关系的问题,提出了一种基于深层信息散度最大化的目标函数表示方法。该方法能通过计算特征所在分布之间相似度,来对特征间的非线性关系进行隐性表示,并在最大化这种统计相关性的优化目标指导下,使深度神经网络向着同类数据更紧凑、异类数据更分散的方向优化,最终达到提升深层特征空间区分性的目标。实验结果表明,相对于其他深度学习方法,所提方法的相对等错误率(EER)最多降低了15.80%,显著提升了系统性能。

关键词: 说话人确认, 目标函数, 深层信息散度, 特征表示学习

Abstract:

To solve the problem that the nonlinear relationship between speaker representations cannot be accurately captured in speaker verification, an objective function based on depth information divergence maximization was proposed.It could implicitly represent the nonlinear relationship between speaker representations by calculating the similarity between their distributions.Under the supervision of the optimization goal of maximizing the statistical correlation, the deep neural network was optimized towards the direction that the within-class data was more compact and the between-class data were far away from each other, and finally the discrimination of deep speaker representation space could be effectively improved.Experimental results show that compared with other deep learning methods, the relative EER of the proposed method is reduced by 15.80% at most, which significantly improves the system performance.

Key words: speaker verification, objective function, deep information divergence, representation learning

中图分类号: 

No Suggested Reading articles found!