电信科学 ›› 2023, Vol. 39 ›› Issue (2): 92-102.doi: 10.11959/j.issn.1000-0801.2023006

• 研究与开发 • 上一篇    下一篇

基于上下文信息与注意力特征的欺骗语音检测

陈佳1, 章坚武1, 张浙亮2   

  1. 1 杭州电子科技大学,浙江 杭州 310018
    2 浙江宇视科技有限公司,浙江 杭州 310051
  • 修回日期:2023-01-05 出版日期:2023-02-20 发布日期:2023-02-01
  • 作者简介:陈佳(2000- ),女,杭州电子科技大学通信工程学院硕士生,主要研究方向为语音检测与人工智能等
    章坚武(1961- ),男,博士,杭州电子科技大学通信工程学院教授、博士生导师,中国电子学会高级会员,浙江省通信学会常务理事,主要研究方向为移动通信、多媒体信号处理与人工智能、通信网络与信息安全
    张浙亮(1969- ),男,博士,浙江宇视科技有限公司副总裁,主要研究方向为人工智能、人力资源等
  • 基金资助:
    国家自然科学基金资助项目(U1866209);国家自然科学基金资助项目(61772162)

Spoof speech detection based on context information and attention feature

Jia CHEN1, Jianwu ZHANG1, Zheliang ZHANG2   

  1. 1 Hangzhou Dianzi University, Hangzhou 310018, China
    2 Zhejiang Uniview Technologies Co., Ltd., Hangzhou 310051, China
  • Revised:2023-01-05 Online:2023-02-20 Published:2023-02-01
  • Supported by:
    The National Natural Science Foundation of China(U1866209);The National Natural Science Foundation of China(61772162)

摘要:

随着语音合成和语音转换技术的快速发展,欺骗语音检测方法仍存在欺骗检测准确率低、通用性差等问题。因此,提出一种基于上下文信息与注意力特征的端到端的欺骗检测方法。该方法基于深度残差收缩网络(DRSN),利用双分支上下文信息协调融合模块(DCCM)聚集丰富的上下文信息,融合基于协调时频注意力机制(CTFA)的特征以获得具有上下文信息的跨维度交互特征,从而最大化捕获伪影的潜力。与最佳基线系统相比,在ASVspoof 2019 LA数据集中,所提方法在EER和t-DCF性能指标上分别降低68%和65%;在ASVspoof 2021 LA数据集中,所提方法的EER和t-DCF分别为4.81和0.311 5,分别降低48%和10%。实验结果表明,所提方法能有效提高欺骗语音检测的准确率和泛化能力。

关键词: 欺骗语音检测, 上下文信息, 注意力特征, 端到端, 伪影

Abstract:

With the rapid development of speech synthesis and speech conversion technology, methods of spoof speech detection still have problems such as low spoof detection accuracy and poor generality.Therefore, an end-to-end spoof detection method based on context information and attention feature was proposed.Based on deep residual shrinkage network (DRSN), the proposed method used the dual-branch context information coordination fusion module (DCCM) to aggregate rich context information, and fused features based on coordinate time-frequency attention (CTFA) to obtain cross-dimensional interaction features with context information, thus maximizing the potential of capturing artifacts.Compared with the best baseline system, in the ASVspoof 2019 LA dataset, the proposed method had reduced the EER and t-DCF performance indicators by 68% and 65% respectively, in the ASVspoof 2021 LA dataset, the EER and t-DCF of the proposed method were 4.81 and 0.311 5 and dropped by 48% and 10% separately.The experimental results show that this method can effectively improve the accuracy and generalization ability of spoof speech detection.

Key words: spoof speech detection, context information, attention feature, end-to-end, artifacts

中图分类号: 

No Suggested Reading articles found!