Journal on Communications ›› 2023, Vol. 44 ›› Issue (5): 64-78.doi: 10.11959/j.issn.1000-436x.2023070
• Papers • Previous Articles Next Articles
Jinzhi ZHENG1,2, Ruyi JI1,2, Libo ZHANG1,3, Chen ZHAO1,3
Revised:
2023-01-31
Online:
2023-05-25
Published:
2023-05-01
CLC Number:
Jinzhi ZHENG, Ruyi JI, Libo ZHANG, Chen ZHAO. End-to-end scene text detection and recognition algorithm based on Transformer decoders[J]. Journal on Communications, 2023, 44(5): 64-78.
"
算法 | 端到端的场景文本检测与识别 | 帧率/(frame·s-1) | |||
G约束 | W约束 | S约束 | 无字典约束 | ||
Mask TextSpotter v1 | 62.4% | 73.0% | 79.3% | — | — |
CharNet R-50 | 62.2% | 74.5% | 80.2% | 60.72% | 0.8 |
TextBoxes++ | 51.9% | 65.9% | 73.3% | — | — |
TextDragon | 65.2% | 78.3% | 82.5% | — | — |
Text Perceptron | 65.1% | 76.6% | 80.5% | — | — |
Boundary TextSpotter | 64.1% | 75.2% | 79.7% | — | — |
PGNet | 63.5% | 78.3% | 83.3% | — | — |
MANGO | 67.3% | 78.9% | 81.8% | — | — |
ABCNet v2 | 73.0% | 78.5% | 82.7% | — | — |
TOSS | 52.4% | 59.6% | 65.9% | — | — |
SPTS | 65.8% | 70.2% | 77.5% | — | 1.5 |
SPTS v2 | 70.3% | 75.6% | 81.7% | — | — |
本文算法 | 73.7% | 76.8% | 80.5% | 69.2% | 4.4 |
"
算法 | 端到端的场景文本检测与识别 | 帧率/(frame·s-1) | |
无字典约束 | 全字典约束 | ||
Mask TextSpotter v1 | 52.9% | 71.8% | — |
FOTS | 32.2% | — | — |
CharNet H-88 | 66.6% | — | 0.5 |
TextDragon | 48.8% | 74.8% | — |
Mask TextSpotter v2 | 65.3% | 77.4% | 3.1 |
Unconstrained | 67.8% | — | — |
ABCNet | 64.2% | 75.7% | — |
Boundary TextSpotter | 65.0% | 76.1% | — |
PGNet | 63.1% | — | — |
ABCNet v2 | 70.4% | 78.1% | 3.5 |
TOSS | 65.1% | 74.8% | — |
本文算法 | 70.9% | 78.1% | 6.4 |
[1] | LONG S B , HE X , YAO C . Scene text detection and recognition:the deep learning era[J]. International Journal of Computer Vision, 2021,129(1): 161-184. |
[2] | 陈卓, 王国胤, 刘群 . 结合多粒度特征融合的自然场景文本检测方法[J]. 计算机科学, 2021,48(12): 243-248. |
CHEN Z , WANG G Y , LIU Q . Natural scene text detection algorithm combining multi-granularity feature fusion[J]. Computer Science, 2021,48(12): 243-248. | |
[3] | 邵海琳, 季怡, 刘纯平 ,等. 基于增强特征金字塔网络的场景文本检测算法[J]. 计算机科学, 2022,49(2): 248-255. |
SHAO H L , JI Y , LIU C P ,et al. Scene text detection algorithm based on enhanced feature pyramid network[J]. Computer Science, 2022,49(2): 248-255. | |
[4] | 丁明宇, 牛玉磊, 卢志武 ,等. 基于深度学习的图片中商品参数识别方法[J]. 软件学报, 2018,29(4): 1039-1048. |
DING M Y , NIU Y L , LU Z W ,et al. Deep learning for parameter recognition in commodity images[J]. Journal of Software, 2018,29(4): 1039-1048. | |
[5] | LI H , WANG P , SHEN C H . Towards end-to-end text spotting with convolutional recurrent neural networks[C]// Proceedings of 2017 IEEE International Conference on Computer Vision (ICCV). Piscataway:IEEE Press, 2017: 5248-5256. |
[6] | LYU P Y , LIAO M H , YAO C ,et al. Mask TextSpotter:an end-to-end trainable neural network for spotting text with arbitrary shapes[C]// European Conference on Computer Vision. Berlin:Springer, 2018: 71-88. |
[7] | XING L J , TIAN Z , HUANG W L ,et al. Convolutional character networks[C]// Proceedings of IEEE/CVF International Conference on Computer Vision (ICCV). Piscataway:IEEE Press, 2020: 9125-9135. |
[8] | LI H , WANG P , SHEN C H ,et al. Show,attend and read:a simple and strong baseline for irregular text recognition[C]// Proceedings of the AAAI Conference on Artificial Intelligence. Palo Alto:AAAI Press, 2019: 8610-8617. |
[9] | YU D L , LI X , ZHANG C Q ,et al. Towards accurate scene text recognition with semantic reasoning networks[C]// Proceedings of 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). Piscataway:IEEE Press, 2020: 12110-12119. |
[10] | YUE X Y , KUANG Z H , LIN C H ,et al. RobustScanner:dynamically enhancing positional clues for robust text recognition[C]// European Conference on Computer Vision. Berlin:Springer, 2020: 135-151. |
[11] | FANG S C , XIE H T , WANG Y X ,et al. Read like humans:autonomous,bidirectional and iterative language modeling for scene text recognition[C]// Proceedings of 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). Piscataway:IEEE Press, 2021: 7094-7103. |
[12] | FENG W , HE W H , YIN F ,et al. TextDragon:an end-to-end framework for arbitrary shaped text spotting[C]// Proceedings of IEEE/CVF International Conference on Computer Vision (ICCV). Piscataway:IEEE Press, 2020: 9075-9084. |
[13] | LIAO M H , LYU P Y , HE M H ,et al. Mask TextSpotter:an end-to-end trainable neural network for spotting text with arbitrary shapes[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2021,43(2): 532-548. |
[14] | LIU Y L , CHEN H , SHEN C H ,et al. ABCNet:real-time scene text spotting with adaptive bezier-curve network[C]// Proceedings of 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). Piscataway:IEEE Press, 2020: 9806-9815. |
[15] | LIAO M H , PANG G , HUANG J ,et al. Mask TextSpotter v3:segmentation proposal network for robust scene text spotting[C]// European Conference on Computer Vision. Berlin:Springer, 2020: 706-722. |
[16] | 王建新, 王子亚, 田萱 . 基于深度学习的自然场景文本检测与识别综述[J]. 软件学报, 2020,31(5): 1465-1496. |
WANG J X , WANG Z Y , TIAN X . Review of natural scene text detection and recognition based on deep learning[J]. Journal of Software, 2020,31(5): 1465-1496. | |
[17] | BAEK Y , LEE B , HAN D ,et al. Character region awareness for text detection[C]// Proceedings of IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). Piscataway:IEEE Press, 2020: 9357-9366. |
[18] | ZHANG S X , ZHU X B , HOU J B ,et al. Deep relational reasoning graph network for arbitrary shape text detection[C]// Proceedings of 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). Piscataway:IEEE Press, 2020: 9696-9705. |
[19] | TIAN Z T , SHU M , LYU P Y ,et al. Learning shape-aware embedding for scene text detection[C]// Proceedings of 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). Piscataway:IEEE Press, 2020: 4229-4238. |
[20] | 李煌, 王晓莉, 项欣光 . 基于文本三区域分割的场景文本检测方法[J]. 计算机科学, 2020,47(11): 142-147. |
LI H , WANG X L , XIANG X G . Scene text detection based on triple segmentation[J]. Computer Science, 2020,47(11): 142-147. | |
[21] | LI J C , LIN Y , LIU R R ,et al. RSCA:real-time segmentation-based context-aware scene text detection[C]// Proceedings of 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW). Piscataway:IEEE Press, 2021: 2349-2358. |
[22] | LIAO M H , ZOU Z S , WAN Z Y ,et al. Real-time scene text detection with differentiable binarization and adaptive scale fusion[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2023,45(1): 919-931. |
[23] | SHENG F F , CHEN Z N , XU B . NRTR:a no-recurrence sequence-to-sequence model for scene text recognition[C]// Proceedings of International Conference on Document Analysis and Recognition (ICDAR). Piscataway:IEEE Press, 2020: 781-786. |
[24] | YANG L , DANG F , WANG P ,et al. A holistic representation guided attention network for scene text recognition[J]. arXiv Preprint,arXiv:1904.01375v3, 2019. |
[25] | QIAO L , TANG S L , CHENG Z Z ,et al. Text perceptron:towards end-to-end arbitrary-shaped text spotting[C]// Proceedings of the AAAI Conference on Artificial Intelligence. Palo Alto:AAAI Press, 2020: 11899-11907. |
[26] | WANG P F , ZHANG C Q , QI F ,et al. PGNet:real-time arbitrarily-shaped text spotting with point gathering network[C]// Proceedings of the AAAI Conference on Artificial Intelligence. Palo Alto:AAAI Press, 2021: 2782-2790. |
[27] | LIU X B , LIANG D , YAN S ,et al. FOTS:fast oriented text spotting with a unified network[C]// Proceedings of 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Piscataway:IEEE Press, 2018: 5676-5685. |
[28] | HE T , TIAN Z , HUANG W L ,et al. An end-to-end TextSpotter with explicit alignment and attention[C]// Proceedings of 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Piscataway:IEEE Press, 2018: 5020-5029. |
[29] | QIN S Y , BISSACO A , RAPTIS M ,et al. Towards unconstrained end-to-end text spotting[C]// Proceedings of 2019 IEEE/CVF International Conference on Computer Vision (ICCV). Piscataway:IEEE Press, 2020: 4703-4713. |
[30] | QIAO L , CHEN Y , CHENG Z Z ,et al. MANGO:a mask attention guided one-stage scene text spotter[C]// Proceedings of the AAAI Conference on Artificial Intelligence. Palo Alto:AAAI Press, 2021,35(3): 2467-2476. |
[31] | HE K M , ZHANG X Y , REN S Q ,et al. Deep residual learning for image recognition[C]// Proceedings of 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR). Piscataway:IEEE Press, 2016: 770-778. |
[32] | ZHOU X Y , YAO C , WEN H ,et al. EAST:an efficient and accurate scene text detector[C]// Proceedings of 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR). Piscataway:IEEE Press, 2017: 2642-2651. |
[33] | LIAO M H , WAN Z Y , YAO C ,et al. Real-time scene text detection with differentiable binarization[C]// Proceedings of the AAAI Conference on Artificial Intelligence. Palo Alto:AAAI Press, 2020: 11474-11481. |
[34] | WANG W H , XIE E Z , LI X ,et al. Shape robust text detection with progressive scale expansion network[C]// Proceedings of IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). Piscataway:IEEE Press, 2020: 9328-9337. |
[35] | VATTI B R . A generic solution to polygon clipping[J]. Communications of the ACM, 1992,35(7): 56-63. |
[36] | GIRSHICK R . Fast R-CNN[C]// Proceedings of IEEE International Conference on Computer Vision (ICCV). Piscataway:IEEE Press, 2016: 1440-1448. |
[37] | MILLETARI F , NAVAB N , AHMADI S A . V-net:fully convolutional neural networks for volumetric medical image segmentation[C]// Proceedings of 2016 Fourth International Conference on 3D Vision (3DV). Piscataway:IEEE Press, 2016: 565-571. |
[38] | GUPTA A , VEDALDI A , ZISSERMAN A . Synthetic data for text localisation in natural images[C]// Proceedings of 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR). Piscataway:IEEE Press, 2016: 2315-2324. |
[39] | KARATZAS D , SHAFAIT F , UCHIDA S ,et al. ICDAR 2013 robust reading competition[C]// Proceedings of 2013 12th International Conference on Document Analysis and Recognition. Piscataway:IEEE Press, 2013: 1484-1493. |
[40] | CH'NG C K , CHAN C S . Total-text:a comprehensive dataset for scene text detection and recognition[C]// Proceedings of 14th IAPR International Conference on Document Analysis and Recognition (ICDAR). Piscataway:IEEE Press, 2018: 935-942. |
[41] | KARATZAS D , GOMEZ-BIGORDA L , NICOLAOU A , et al . ICDAR 2015 competition on robust reading[C]// Proceedings of 2015 13th International Conference on Document Analysis and Recognition (ICDAR). Piscataway:IEEE Press, 2015: 1156-1160. |
[42] | ZHONG Z , JIN L , ZHANG S ,et al. DeepText:a unified framework for text proposal generation and text detection in natural images[J]. arXiv Preprint,arXiv:1605.07314v1, 2016. |
[43] | LIAO M H , SHI B G , BAI X . TextBoxes++:a single-shot oriented scene text detector[J]. IEEE Transactions on Image Processing:a Publication of the IEEE Signal Processing Society, 2018,27(8): 3676-3690. |
[44] | WANG H , LU P , ZHANG H ,et al. All You need is boundary:toward arbitrary-shaped text spotting[C]// Proceedings of the AAAI Conference on Artificial Intelligence. Palo Alto:AAAI Press, 2020: 12160-12167. |
[45] | LIU Y L , SHEN C H , JIN L W ,et al. ABCNet v2:adaptive bezier-curve network for real-time end-to-end text spotting[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2022,44(11): 8048-8064. |
[46] | TANG J Q , QIAO S , CUI B L ,et al. You can even annotate text with voice:transcription-only-supervised text spotting[C]// Proceedings of the 30th ACM International Conference on Multimedia. New York:ACM Press, 2022: 4154-4163. |
[47] | PENG D , WANG X , LIU Y ,et al. SPTS:single-point text spotting[J]. arXiv Preprint,arXiv:2112.07917, 2021. |
[48] | LIU Y , ZHANG J , PENG D ,et al. SPTS v2:single-point scene text spotting[J]. arXiv Preprint,arXiv:2301.01635v1, 2023. |
[1] | Weigang HUO, Rui LIANG, Yonghua LI. Anomaly detection model for multivariate time series based on stochastic Transformer [J]. Journal on Communications, 2023, 44(2): 94-103. |
[2] | Yanwen WANG, Weimin LEI, Wei ZHANG, Huan MENG, Xinyi CHEN, Wenhui YE, Qingyang JING. Survey on video image reconstruction method based on generative model [J]. Journal on Communications, 2022, 43(9): 194-208. |
[3] | Zhengyu ZHU, Pengfei CHEN, Zixuan WANG, Kexian GONG, Di WU, Zhongyong WANG. Short wave protocol signals recognition based on Swin-Transformer [J]. Journal on Communications, 2022, 43(11): 127-135. |
[4] | Lei SUN, Jianquan WANG, Shangjing LIN, Zhangchao MA, Wei LI, Liang Qilian, Rong HUANG. Research on 5G-TSN joint scheduling mechanism based on radio channel information [J]. Journal on Communications, 2021, 42(12): 65-75. |
[5] | Ze’nan WANG, Jiao ZHANG, Shuo WANG, Tao HUANG, F.Richard Yu. Service chain deployment algorithms for deterministic end-to-end delay upper bound [J]. Journal on Communications, 2021, 42(11): 66-78. |
[6] | . Packet-loss robust scalable authentication algorithm for compressed image streaming [J]. Journal on Communications, 2014, 35(4): 20-181. |
[7] | Xiao-wei YI,Heng-tai MA,Gang ZHENG,Chang-wen ZHENG. Packet-loss robust scalable authentication algorithm for compressed image streaming [J]. Journal on Communications, 2014, 35(4): 174-181. |
[8] | Xue-fen CHI,Ying-ying ZHAO. Network analytical model of tandem queuing RED and ERED [J]. Journal on Communications, 2011, 32(9): 174-181. |
[9] | Peng ZHANG,Jian-ping YU,Hong-wei LIU. Data fusion protocol with source security for sensor networks [J]. Journal on Communications, 2010, 31(11): 87-91. |
[10] | Miao XUE,De-yun GAO,Si-dong ZHANG,Hong-ke ZHANG. End-to-end multipath transport layer architecture oriented the next generation network [J]. Journal on Communications, 2010, 31(10): 26-35. |
[11] | Xue-juan GAO,Li ZHUO,Lan-sun SHEN. H.264 rate-distortion model based joint source channel coding scheme over wireless channels [J]. Journal on Communications, 2008, 29(9): 24-31. |
[12] | Lian-ming ZHANG,Da-zu HUANG,Zhi-gang CHEN. Models of bounds on end-to-end delay of long-range dependence traffic based on fractal leaky buckets [J]. Journal on Communications, 2008, 29(7): 32-38. |
[13] | Wei-xuan GU,UShun-zheng Y. Novel approach to measure and estimate one-way queuing delay without clock synchronization [J]. Journal on Communications, 2007, 28(9): 104-111. |
[14] | Wei-qiang WANG,Li-bo FU,Wen GAO,Qing-ming HUANG,Shu-qiang JIANG. Text detection based on stroke features [J]. Journal on Communications, 2007, 28(12): 116-120. |
[15] | Kun SHA,Xiao-liang SHAO,Jun-feng XIA,Ji-bing HU. The analysis of the change of routing based on theend-to-end measurement [J]. Journal on Communications, 2005, 26(1A): 133-135. |
Viewed | ||||||
Full text |
|
|||||
Abstract |
|
|||||
|