电信科学 ›› 2022, Vol. 38 ›› Issue (12): 56-64.doi: 10.11959/j.issn.1000-0801.2022294

• 研究与开发 • 上一篇    下一篇

融合递增词汇选择的深度学习中文输入法

任华健, 郝秀兰, 徐稳静   

  1. 湖州师范学院浙江省现代农业资源智慧管理与应用研究重点实验室,浙江 湖州 313000
  • 修回日期:2022-12-07 出版日期:2022-12-20 发布日期:2022-12-01
  • 作者简介:任华健(1994- ),男,湖州师范学院硕士生,主要研究方向为自然语言处理、中文输入法
    郝秀兰(1970- ),博士,女,湖州师范学院副教授、硕士生导师,主要研究方向为智能信息处理、数据与知识工程、自然语言理解等
    徐稳静(1998- ),女,湖州师范学院硕士生,主要研究方向为自然语言处理、虚假新闻检测
  • 基金资助:
    浙江省现代农业资源智慧管理与应用研究重点实验室基金项目(2020E10017)

Deep learning Chinese input method with incremental vocabulary selection

Huajian REN, Xiulan HAO, Wenjing XU   

  1. Zhejiang Province Key Laboratory of Smart Management and Application of Modern Agricultural Resources, Huzhou University, Huzhou 313000, China
  • Revised:2022-12-07 Online:2022-12-20 Published:2022-12-01
  • Supported by:
    The Foundation of Zhejiang Province Key Laboratory of Smart Management and Application of Modern Agricultural Resources(2020E10017)

摘要:

输入法的核心任务是将用户输入的按键序列转化为汉字序列。应用深度学习算法的输入法在学习长距离依赖和解决数据稀疏问题方面存在优势,然而现有方法仍存在两方面问题,一是采用的拼音切分与转换分离的结构导致了误差传播,二是模型复杂难以满足输入法对实时性的需求。针对上述不足提出了一种融合了递增词汇选择算法的深度学习的输入法模型并对比了多种softmax优化方法。在人民日报数据和中文维基百科数据上进行的实验表明,该模型的转换准确率相较当前最高性能提升了15%,融合递增词汇选择算法使模型在不损失转换精度的同时速度提升了130倍。

关键词: 中文输入法, 长短期记忆, 词汇选择

Abstract:

The core task of an input method is to convert the keystroke sequences typed by users into Chinese character sequences.Input methods applying deep learning methods have advantages in learning long-range dependencies and solving data sparsity problems.However, the existing methods still have two shortcomings: the separation structure of pinyin slicing in conversion leads to error propagation, and the model is complicated to meet the demand for real-time performance of the input method.A deep-learning input method model incorporating incremental word selection methods was proposed to address these shortcomings.Various softmax optimization methods were compared.Experiments on People’s Daily data and Chinese Wikipedia data show that the model improves the conversion accuracy by 15% compared with the current state-of-the-art model, and the incremental vocabulary selection method makes the model 130 times faster without losing conversion accuracy.

Key words: Chinese input method, long short-term memory, vocabulary selection

中图分类号: 

No Suggested Reading articles found!