通信学报 ›› 2023, Vol. 44 ›› Issue (5): 64-78.doi: 10.11959/j.issn.1000-436x.2023070

• 学术论文 • 上一篇    下一篇

基于Transformer解码的端到端场景文本检测与识别算法

郑金志1,2, 汲如意1,2, 张立波1,3, 赵琛1,3   

  1. 1 中国科学院软件研究所智能软件研究中心,北京 100190
    2 中国科学院大学,北京 100190
    3 中国科学院软件研究所计算机科学国家重点实验室,北京 100190
  • 修回日期:2023-01-31 出版日期:2023-05-25 发布日期:2023-05-01
  • 作者简介:郑金志(1989- ),男,河南周口人,中国科学院大学博士生,主要研究方向为机器视觉、自然语言处理等
    汲如意(1988- ),男,山东日照人,博士,中国科学院软件研究所助理研究员,主要研究方向为机器学习、计算机视觉、图像处理、模式识别等
    张立波(1989- ),男,安徽阜阳人,博士,中国科学院软件研究所副研究员、硕士生导师,主要研究方向为图像处理、模式识别等
    赵琛(1967- ),男,云南普洱人,博士,中国科学院软件研究所研究员、博士生导师,主要研究方向为编译技术、操作系统、网络软件等

End-to-end scene text detection and recognition algorithm based on Transformer decoders

Jinzhi ZHENG1,2, Ruyi JI1,2, Libo ZHANG1,3, Chen ZHAO1,3   

  1. 1 Intelligent Software Research Center, Institute of Software, Chinese Academy of Sciences, Beijing 100190, China
    2 University of Chinese Academy of Sciences, Beijing 100190, China
    3 State Key Laboratory of Computer Science, Institute of Software, Chinese Academy of Sciences, Beijing 100190, China
  • Revised:2023-01-31 Online:2023-05-25 Published:2023-05-01

摘要:

针对任意形状的场景文本检测与识别,提出一种新的端到端场景文本检测与识别算法。首先,引入了文本感知模块基于分割思想的检测分支从卷积网络提取的视觉特征中完成场景文本的检测;然后,由基于Transformer视觉模块和Transformer语言模块组成的识别分支对检测结果进行文本特征的编码;最后,由识别分支中的融合门融合编码的文本特征,输出场景文本。在Total-Text、ICDAR2013和ICDAR2015基准数据集上进行的实验结果表明,所提算法在召回率、准确率和F值上均表现出了优秀的性能,且时间效率具有一定的优势。

关键词: 文本检测, 文本识别, 端到端, Transformer

Abstract:

Aiming at the detection and recognition task of arbitrary shape text in scene, a novelty scene text detection and recognition algorithm which could be trained by end-to-end algorithm was proposed.Firstly, the detection branch of text aware module based on segmentation idea was introduced to detect scene text from visual features extracted by convolutional network.Then, a recognition branch based on Transformer vision module and Transformer language module encoded the text features of the detection results.Finally, the text features encoded by the fusion gate in the recognition branch were fused to output the scene text.The experimental results on the three benchmark datasets of Total-Text, ICDAR2013 and ICDAR2015 show that the proposed algorithm has excellent performance in recall, precision, F-score, and has certain advantages in efficiency.

Key words: text detection, text recognition, end-to-end, Transformer

中图分类号: 

No Suggested Reading articles found!