通信学报 ›› 2022, Vol. 43 ›› Issue (11): 65-79.doi: 10.11959/j.issn.1000-436x.2022222

• 学术论文 • 上一篇    下一篇

基于中文语义-音韵信息的语音识别文本校对模型

仲美玉1, 吴培良1,2, 窦燕1,3, 刘毅1, 孔令富1,2   

  1. 1 燕山大学信息科学与工程学院,河北 秦皇岛 066004
    2 河北省计算机虚拟技术与系统集成重点实验室,河北 秦皇岛 066004
    3 河北省软件工程重点实验室,河北 秦皇岛 066004
  • 修回日期:2022-10-24 出版日期:2022-11-25 发布日期:2022-11-01
  • 作者简介:仲美玉(1993− ),女,河北邢台人,燕山大学博士生,主要研究方向为智能信息处理
    吴培良(1981− ),男,河北石家庄人,博士,燕山大学教授、博士生导师,主要研究方向为自然语言处理、深度强化学习、机器人操作技能学习
    窦燕(1968− ),女,陕西西安人,博士,燕山大学教授、硕士生导师,主要研究方向为智能信息处理、机器视觉与模式识别
    刘毅(1998− ),男,河北石家庄人,燕山大学硕士生,主要研究方向为智能信息处理、机器视觉
    孔令富(1957− ),男,吉林公主岭人,博士,燕山大学教授、博士生导师,主要研究方向为智能控制与智能信息处理、机器人视觉
  • 基金资助:
    国家重点研发计划基金资助项目(2018YFB1308300);国家自然科学基金资助项目(62276028);国家自然科学基金资助项目(U20A20167);北京市自然科学基金资助项目(4202026);河北省自然科学基金资助项目(F202103079);河北省创新能力提升计划基金资助项目(22567626H);河北省软件工程重点实验室基金资助项目(22567637H)

Chinese semantic and phonological information-based text proofreading model for speech recognition

Meiyu ZHONG1, Peiliang WU1,2, Yan DOU1,3, Yi LIU1, Lingfu KONG1,2   

  1. 1 School of Information Science and Engineering, Yanshan University, Qinhuangdao 066004, China
    2 The Key Laboratory for Computer Virtual Technology and System Integration of Hebei Province, Qinhuangdao 066004, China
    3 The Key Laboratory of Software Engineering of Hebei Province, Qinhuangdao 066004, China
  • Revised:2022-10-24 Online:2022-11-25 Published:2022-11-01
  • Supported by:
    The National Key Research and Development Program of China(2018YFB1308300);The National Natural Science Foundation of China(62276028);The National Natural Science Foundation of China(U20A20167);Beijing Natural Science Foundation(4202026);The Natural Science Foundation of Hebei Province(F202103079);The Innovation Capability Improvement Plan Project of Hebei Province(22567626H);The Project of the Key Laboratory of Software Engineering of Hebei Province(22567637H)

摘要:

为了研究拼音对检测和纠正语音识别文本错误的影响,提出了一种基于中文语义-音韵信息的文本校对模型。定义了5种拼音编码方法构建字符-音韵嵌入向量,以此作为基于GRU的Seq2Seq模型的输入,并应用注意力机制提取语句的语义-音韵信息来校对语音识别文本错误。针对标注语料不足的问题,提出了一种基于拼音声韵置换的数据增强方法。在 AISHELL-3 公开数据集的实验结果表明,拼音携带的音韵信息有利于校对语音识别文本错误,所提方法可提升模型的检错性能。

关键词: 文本校对, 语音识别, 拼音, 注意力机制

Abstract:

To study the influence of Chinese Pinyin on detecting and correcting text errors in speech recognition, a text proofreading model based on Chinese semantic and phonological information was proposed.Five Pinyin coding methods were designed to construct the character-Pinyin embedding vector that was employed as the input of the Seq2Seq model based on gated recurrent unit.At the same time, the attention mechanism was adopted to extract the Chinese semantic and phonological information of sentences to correct speech recognition errors.Aiming at the problem of insufficient labeled corpus, a data augmentation method was introduced, which could automatically obtain annotated corpora by exchanging the initials or finals of Chinese Pinyin.The experimental results on AISHELL-3’s public data show that phonological information is conducive to the text proofreading model to detect and correct text errors after speech recognition, and the proposed data augmentation method can improve the error detection performance of the model.

Key words: text proofreading, speech recognition, Pinyin, attention mechanism

中图分类号: 

No Suggested Reading articles found!