通信学报 ›› 2023, Vol. 44 ›› Issue (5): 64-78.doi: 10.11959/j.issn.1000-436x.2023070
郑金志1,2, 汲如意1,2, 张立波1,3, 赵琛1,3
修回日期:
2023-01-31
出版日期:
2023-05-25
发布日期:
2023-05-01
作者简介:
郑金志(1989- ),男,河南周口人,中国科学院大学博士生,主要研究方向为机器视觉、自然语言处理等Jinzhi ZHENG1,2, Ruyi JI1,2, Libo ZHANG1,3, Chen ZHAO1,3
Revised:
2023-01-31
Online:
2023-05-25
Published:
2023-05-01
摘要:
针对任意形状的场景文本检测与识别,提出一种新的端到端场景文本检测与识别算法。首先,引入了文本感知模块基于分割思想的检测分支从卷积网络提取的视觉特征中完成场景文本的检测;然后,由基于Transformer视觉模块和Transformer语言模块组成的识别分支对检测结果进行文本特征的编码;最后,由识别分支中的融合门融合编码的文本特征,输出场景文本。在Total-Text、ICDAR2013和ICDAR2015基准数据集上进行的实验结果表明,所提算法在召回率、准确率和F值上均表现出了优秀的性能,且时间效率具有一定的优势。
中图分类号:
郑金志, 汲如意, 张立波, 赵琛. 基于Transformer解码的端到端场景文本检测与识别算法[J]. 通信学报, 2023, 44(5): 64-78.
Jinzhi ZHENG, Ruyi JI, Libo ZHANG, Chen ZHAO. End-to-end scene text detection and recognition algorithm based on Transformer decoders[J]. Journal on Communications, 2023, 44(5): 64-78.
表3
本文算法与其他算法在ICDAR2015数据集上的F值"
算法 | 端到端的场景文本检测与识别 | 帧率/(frame·s-1) | |||
G约束 | W约束 | S约束 | 无字典约束 | ||
Mask TextSpotter v1 | 62.4% | 73.0% | 79.3% | — | — |
CharNet R-50 | 62.2% | 74.5% | 80.2% | 60.72% | 0.8 |
TextBoxes++ | 51.9% | 65.9% | 73.3% | — | — |
TextDragon | 65.2% | 78.3% | 82.5% | — | — |
Text Perceptron | 65.1% | 76.6% | 80.5% | — | — |
Boundary TextSpotter | 64.1% | 75.2% | 79.7% | — | — |
PGNet | 63.5% | 78.3% | 83.3% | — | — |
MANGO | 67.3% | 78.9% | 81.8% | — | — |
ABCNet v2 | 73.0% | 78.5% | 82.7% | — | — |
TOSS | 52.4% | 59.6% | 65.9% | — | — |
SPTS | 65.8% | 70.2% | 77.5% | — | 1.5 |
SPTS v2 | 70.3% | 75.6% | 81.7% | — | — |
本文算法 | 73.7% | 76.8% | 80.5% | 69.2% | 4.4 |
表4
各算法Total-Text数据集上的F值"
算法 | 端到端的场景文本检测与识别 | 帧率/(frame·s-1) | |
无字典约束 | 全字典约束 | ||
Mask TextSpotter v1 | 52.9% | 71.8% | — |
FOTS | 32.2% | — | — |
CharNet H-88 | 66.6% | — | 0.5 |
TextDragon | 48.8% | 74.8% | — |
Mask TextSpotter v2 | 65.3% | 77.4% | 3.1 |
Unconstrained | 67.8% | — | — |
ABCNet | 64.2% | 75.7% | — |
Boundary TextSpotter | 65.0% | 76.1% | — |
PGNet | 63.1% | — | — |
ABCNet v2 | 70.4% | 78.1% | 3.5 |
TOSS | 65.1% | 74.8% | — |
本文算法 | 70.9% | 78.1% | 6.4 |
[1] | LONG S B , HE X , YAO C . Scene text detection and recognition:the deep learning era[J]. International Journal of Computer Vision, 2021,129(1): 161-184. |
[2] | 陈卓, 王国胤, 刘群 . 结合多粒度特征融合的自然场景文本检测方法[J]. 计算机科学, 2021,48(12): 243-248. |
CHEN Z , WANG G Y , LIU Q . Natural scene text detection algorithm combining multi-granularity feature fusion[J]. Computer Science, 2021,48(12): 243-248. | |
[3] | 邵海琳, 季怡, 刘纯平 ,等. 基于增强特征金字塔网络的场景文本检测算法[J]. 计算机科学, 2022,49(2): 248-255. |
SHAO H L , JI Y , LIU C P ,et al. Scene text detection algorithm based on enhanced feature pyramid network[J]. Computer Science, 2022,49(2): 248-255. | |
[4] | 丁明宇, 牛玉磊, 卢志武 ,等. 基于深度学习的图片中商品参数识别方法[J]. 软件学报, 2018,29(4): 1039-1048. |
DING M Y , NIU Y L , LU Z W ,et al. Deep learning for parameter recognition in commodity images[J]. Journal of Software, 2018,29(4): 1039-1048. | |
[5] | LI H , WANG P , SHEN C H . Towards end-to-end text spotting with convolutional recurrent neural networks[C]// Proceedings of 2017 IEEE International Conference on Computer Vision (ICCV). Piscataway:IEEE Press, 2017: 5248-5256. |
[6] | LYU P Y , LIAO M H , YAO C ,et al. Mask TextSpotter:an end-to-end trainable neural network for spotting text with arbitrary shapes[C]// European Conference on Computer Vision. Berlin:Springer, 2018: 71-88. |
[7] | XING L J , TIAN Z , HUANG W L ,et al. Convolutional character networks[C]// Proceedings of IEEE/CVF International Conference on Computer Vision (ICCV). Piscataway:IEEE Press, 2020: 9125-9135. |
[8] | LI H , WANG P , SHEN C H ,et al. Show,attend and read:a simple and strong baseline for irregular text recognition[C]// Proceedings of the AAAI Conference on Artificial Intelligence. Palo Alto:AAAI Press, 2019: 8610-8617. |
[9] | YU D L , LI X , ZHANG C Q ,et al. Towards accurate scene text recognition with semantic reasoning networks[C]// Proceedings of 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). Piscataway:IEEE Press, 2020: 12110-12119. |
[10] | YUE X Y , KUANG Z H , LIN C H ,et al. RobustScanner:dynamically enhancing positional clues for robust text recognition[C]// European Conference on Computer Vision. Berlin:Springer, 2020: 135-151. |
[11] | FANG S C , XIE H T , WANG Y X ,et al. Read like humans:autonomous,bidirectional and iterative language modeling for scene text recognition[C]// Proceedings of 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). Piscataway:IEEE Press, 2021: 7094-7103. |
[12] | FENG W , HE W H , YIN F ,et al. TextDragon:an end-to-end framework for arbitrary shaped text spotting[C]// Proceedings of IEEE/CVF International Conference on Computer Vision (ICCV). Piscataway:IEEE Press, 2020: 9075-9084. |
[13] | LIAO M H , LYU P Y , HE M H ,et al. Mask TextSpotter:an end-to-end trainable neural network for spotting text with arbitrary shapes[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2021,43(2): 532-548. |
[14] | LIU Y L , CHEN H , SHEN C H ,et al. ABCNet:real-time scene text spotting with adaptive bezier-curve network[C]// Proceedings of 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). Piscataway:IEEE Press, 2020: 9806-9815. |
[15] | LIAO M H , PANG G , HUANG J ,et al. Mask TextSpotter v3:segmentation proposal network for robust scene text spotting[C]// European Conference on Computer Vision. Berlin:Springer, 2020: 706-722. |
[16] | 王建新, 王子亚, 田萱 . 基于深度学习的自然场景文本检测与识别综述[J]. 软件学报, 2020,31(5): 1465-1496. |
WANG J X , WANG Z Y , TIAN X . Review of natural scene text detection and recognition based on deep learning[J]. Journal of Software, 2020,31(5): 1465-1496. | |
[17] | BAEK Y , LEE B , HAN D ,et al. Character region awareness for text detection[C]// Proceedings of IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). Piscataway:IEEE Press, 2020: 9357-9366. |
[18] | ZHANG S X , ZHU X B , HOU J B ,et al. Deep relational reasoning graph network for arbitrary shape text detection[C]// Proceedings of 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). Piscataway:IEEE Press, 2020: 9696-9705. |
[19] | TIAN Z T , SHU M , LYU P Y ,et al. Learning shape-aware embedding for scene text detection[C]// Proceedings of 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). Piscataway:IEEE Press, 2020: 4229-4238. |
[20] | 李煌, 王晓莉, 项欣光 . 基于文本三区域分割的场景文本检测方法[J]. 计算机科学, 2020,47(11): 142-147. |
LI H , WANG X L , XIANG X G . Scene text detection based on triple segmentation[J]. Computer Science, 2020,47(11): 142-147. | |
[21] | LI J C , LIN Y , LIU R R ,et al. RSCA:real-time segmentation-based context-aware scene text detection[C]// Proceedings of 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW). Piscataway:IEEE Press, 2021: 2349-2358. |
[22] | LIAO M H , ZOU Z S , WAN Z Y ,et al. Real-time scene text detection with differentiable binarization and adaptive scale fusion[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2023,45(1): 919-931. |
[23] | SHENG F F , CHEN Z N , XU B . NRTR:a no-recurrence sequence-to-sequence model for scene text recognition[C]// Proceedings of International Conference on Document Analysis and Recognition (ICDAR). Piscataway:IEEE Press, 2020: 781-786. |
[24] | YANG L , DANG F , WANG P ,et al. A holistic representation guided attention network for scene text recognition[J]. arXiv Preprint,arXiv:1904.01375v3, 2019. |
[25] | QIAO L , TANG S L , CHENG Z Z ,et al. Text perceptron:towards end-to-end arbitrary-shaped text spotting[C]// Proceedings of the AAAI Conference on Artificial Intelligence. Palo Alto:AAAI Press, 2020: 11899-11907. |
[26] | WANG P F , ZHANG C Q , QI F ,et al. PGNet:real-time arbitrarily-shaped text spotting with point gathering network[C]// Proceedings of the AAAI Conference on Artificial Intelligence. Palo Alto:AAAI Press, 2021: 2782-2790. |
[27] | LIU X B , LIANG D , YAN S ,et al. FOTS:fast oriented text spotting with a unified network[C]// Proceedings of 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Piscataway:IEEE Press, 2018: 5676-5685. |
[28] | HE T , TIAN Z , HUANG W L ,et al. An end-to-end TextSpotter with explicit alignment and attention[C]// Proceedings of 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Piscataway:IEEE Press, 2018: 5020-5029. |
[29] | QIN S Y , BISSACO A , RAPTIS M ,et al. Towards unconstrained end-to-end text spotting[C]// Proceedings of 2019 IEEE/CVF International Conference on Computer Vision (ICCV). Piscataway:IEEE Press, 2020: 4703-4713. |
[30] | QIAO L , CHEN Y , CHENG Z Z ,et al. MANGO:a mask attention guided one-stage scene text spotter[C]// Proceedings of the AAAI Conference on Artificial Intelligence. Palo Alto:AAAI Press, 2021,35(3): 2467-2476. |
[31] | HE K M , ZHANG X Y , REN S Q ,et al. Deep residual learning for image recognition[C]// Proceedings of 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR). Piscataway:IEEE Press, 2016: 770-778. |
[32] | ZHOU X Y , YAO C , WEN H ,et al. EAST:an efficient and accurate scene text detector[C]// Proceedings of 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR). Piscataway:IEEE Press, 2017: 2642-2651. |
[33] | LIAO M H , WAN Z Y , YAO C ,et al. Real-time scene text detection with differentiable binarization[C]// Proceedings of the AAAI Conference on Artificial Intelligence. Palo Alto:AAAI Press, 2020: 11474-11481. |
[34] | WANG W H , XIE E Z , LI X ,et al. Shape robust text detection with progressive scale expansion network[C]// Proceedings of IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). Piscataway:IEEE Press, 2020: 9328-9337. |
[35] | VATTI B R . A generic solution to polygon clipping[J]. Communications of the ACM, 1992,35(7): 56-63. |
[36] | GIRSHICK R . Fast R-CNN[C]// Proceedings of IEEE International Conference on Computer Vision (ICCV). Piscataway:IEEE Press, 2016: 1440-1448. |
[37] | MILLETARI F , NAVAB N , AHMADI S A . V-net:fully convolutional neural networks for volumetric medical image segmentation[C]// Proceedings of 2016 Fourth International Conference on 3D Vision (3DV). Piscataway:IEEE Press, 2016: 565-571. |
[38] | GUPTA A , VEDALDI A , ZISSERMAN A . Synthetic data for text localisation in natural images[C]// Proceedings of 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR). Piscataway:IEEE Press, 2016: 2315-2324. |
[39] | KARATZAS D , SHAFAIT F , UCHIDA S ,et al. ICDAR 2013 robust reading competition[C]// Proceedings of 2013 12th International Conference on Document Analysis and Recognition. Piscataway:IEEE Press, 2013: 1484-1493. |
[40] | CH'NG C K , CHAN C S . Total-text:a comprehensive dataset for scene text detection and recognition[C]// Proceedings of 14th IAPR International Conference on Document Analysis and Recognition (ICDAR). Piscataway:IEEE Press, 2018: 935-942. |
[41] | KARATZAS D , GOMEZ-BIGORDA L , NICOLAOU A , et al . ICDAR 2015 competition on robust reading[C]// Proceedings of 2015 13th International Conference on Document Analysis and Recognition (ICDAR). Piscataway:IEEE Press, 2015: 1156-1160. |
[42] | ZHONG Z , JIN L , ZHANG S ,et al. DeepText:a unified framework for text proposal generation and text detection in natural images[J]. arXiv Preprint,arXiv:1605.07314v1, 2016. |
[43] | LIAO M H , SHI B G , BAI X . TextBoxes++:a single-shot oriented scene text detector[J]. IEEE Transactions on Image Processing:a Publication of the IEEE Signal Processing Society, 2018,27(8): 3676-3690. |
[44] | WANG H , LU P , ZHANG H ,et al. All You need is boundary:toward arbitrary-shaped text spotting[C]// Proceedings of the AAAI Conference on Artificial Intelligence. Palo Alto:AAAI Press, 2020: 12160-12167. |
[45] | LIU Y L , SHEN C H , JIN L W ,et al. ABCNet v2:adaptive bezier-curve network for real-time end-to-end text spotting[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2022,44(11): 8048-8064. |
[46] | TANG J Q , QIAO S , CUI B L ,et al. You can even annotate text with voice:transcription-only-supervised text spotting[C]// Proceedings of the 30th ACM International Conference on Multimedia. New York:ACM Press, 2022: 4154-4163. |
[47] | PENG D , WANG X , LIU Y ,et al. SPTS:single-point text spotting[J]. arXiv Preprint,arXiv:2112.07917, 2021. |
[48] | LIU Y , ZHANG J , PENG D ,et al. SPTS v2:single-point scene text spotting[J]. arXiv Preprint,arXiv:2301.01635v1, 2023. |
[1] | 霍纬纲, 梁锐, 李永华. 基于随机Transformer的多维时间序列异常检测模型[J]. 通信学报, 2023, 44(2): 94-103. |
[2] | 王延文, 雷为民, 张伟, 孟欢, 陈新怡, 叶文慧, 景庆阳. 基于生成模型的视频图像重建方法综述[J]. 通信学报, 2022, 43(9): 194-208. |
[3] | 朱政宇, 陈鹏飞, 王梓晅, 巩克现, 吴迪, 王忠勇. 基于Swin-Transformer的短波协议信号识别[J]. 通信学报, 2022, 43(11): 127-135. |
[4] | 孙雷, 王健全, 林尚静, 马彰超, 李卫, Qilian Liang, 黄蓉. 基于无线信道信息的5G与TSN联合调度机制研究[J]. 通信学报, 2021, 42(12): 65-75. |
[5] | 王泽南, 张娇, 汪硕, 黄韬, F.RichardYu. 端到端时延上限确定的服务链部署算法[J]. 通信学报, 2021, 42(11): 66-78. |
[6] | 陈鸣,代飞,许博,邢长友,李兵,张国敏. 主动测量SDN性能的机制[J]. 通信学报, 2015, 36(6): 31-40. |
[7] | 易小伟,马恒太,郑刚,郑昌文. 压缩图像码流的分组丢失顽健可伸缩认证算法[J]. 通信学报, 2014, 35(4): 174-181. |
[8] | 易小伟1, 2,马恒太1,郑刚1,郑昌文1. 压缩图像码流的分组丢失顽健可伸缩认证算法[J]. 通信学报, 2014, 35(4): 20-181. |
[9] | 迟学芬,赵莹莹. 串联排队RED/ERED网络分析模型[J]. 通信学报, 2011, 32(9): 174-181. |
[10] | 薛淼,高德云,张思东,张宏科. 面向下一代网络的端到端多路径传输层架构[J]. 通信学报, 2010, 31(10): 26-35. |
[11] | 高雪娟,卓力,沈兰荪. 基于H.264率失真模型的无线信源信道联合编码方案[J]. 通信学报, 2008, 29(9): 24-31. |
[12] | 张连明,黄大足,陈志刚. 基于分形漏桶的长程相关业务端到端延迟上界模型[J]. 通信学报, 2008, 29(7): 32-38. |
[13] | 王凤娇,张玉清. 跨域基于口令认证的密钥交换协议的安全模型[J]. 通信学报, 2008, 29(4): 24-29. |
[14] | 古炜旋,余顺争. 非时钟同步的单向排队时延测量估计方法[J]. 通信学报, 2007, 28(9): 104-111. |
[15] | 陈鸣,周骏,常强林,高屹. 用移动探针构造端到端故障诊断系统[J]. 通信学报, 2006, 27(5): 100-106. |
阅读次数 | ||||||
全文 |
|
|||||
摘要 |
|
|||||
|