特征增强和双线性特征向量融合的移动端工业货箱文本检测

doi:10.11959/j.issn.1000-0801.2022139

Abstract

Abstract:

In the real factory environment, due to factors such as dim light, irregular text, and limited equipment, text detection becomes a challenging task.Aiming at this problem, a feature vector fusion module based on bilinear operation was designed and combined with feature enhancement and semi-convolution to form a lightweight text detection network RGFFD (ResNet18 + Ghost Module + FPEM(feature pyramid enhancement module)) + FFM(feature fusion module) + DB (differentiable binarization)).Among them, the Ghost module was embedded with a feature enhancement module to improve the feature extraction capability, the bilinear feature vector fusion module fused multi-scale information, and an adaptive threshold segmentation algorithm was added to improve the segmentation capability of the DB module.In the real industrial environment, the RGFFD detection speed reached 6.5 f/s, when using the embedded device UP2 board for text detection of container numbers.At the same time, the detection speed on the public datasets ICDAR2015 and Total-text reached 39.6 f/s and 49.6 f/s, respectively.The accuracy rate on the custom dataset reached 88.9%, and the detection speed was 30.7 f/s.

Key words: text detection, semi-convolution, feature vector fusion, feature enhancement, feature fusion

CLC Number:

TN929.5

Haiyang HU, Zepin LI, Zhongjin LI. Feature enhancement and bilinear feature vector fusion for text detection of mobile industrial containers[J]. Telecommunications Science, 2022, 38(7): 75-87.

Figures/Tables 18

方法	精确率	召回率	检测速度/（f·s^-1）	检测速度-20/（f·s^-1）
PAN^[15]	85.5%	81.9%	37.5	17.5
EAST^[28]	83.6%	73.5%	13.2	-6.8
SegLink^[26]	73.1%	76.8%	13.9	-6.1
TextSnake^[34]	84.9%	80.4%	1.1	-18.9
TextFuseNet^[35]	$91 . 3 %$	$88 . 9 %$	8.3	-11.7
PSENet^[30]	86.9%	84.5%	1.6	-18.4
CTPN^[25]	74.2%	51.6%	7.1	-12.9
本文方法	85.9%	81.4%	$39 . 6$	$19 . 6$
注：加粗字体为每列最优的结果。

方法	精确率	召回率	检测速度/（f·s^-1）	检测速度-20/（f·s^-1）
PAN^[15]	87.3%	81.5%	39.9	19.9
EAST^[28]	50.1%	36.2%	19.8	-0.2
SegLink^[26]	30.3%	23.8%	9.1	-10.9
TextSnake^[34]	82.7%	74.5%	4.7	-15.3
ATRR^[36]	80.9%	76.2%	25.2	5.2
TextFuseNet^[35]	$87 . 5 %$	$83 . 2 %$	7.1	-12.9
PSENet^[30]	84.8%	79.7%	3.9	-16.1
本文方法	86.9%	78.1%	$45 . 6$	$25 . 6$
注：加粗字体为每列最优的结果。

方法	精确率	召回率	检测速度/（f·s^-1）	检测速度-20/（f·s^-1）
PAN^[15]	86.7%	80.6%	25.4	5.4
EAST^[28]	79.2%	61.3%	9.5	-10.5
SegLink^[26]	70.4%	56.8%	4.2	-15.8
TextSnake^[34]	81.8%	75.2%	0.6	-19.4
PSENet^[30]	87.5%	79.4%	2.1	-17.9
CTPN^[25]	65.3%	49.5%	2.5	-17.5
TextFuseNet^[35]	$89 . 5 %$	$81 . 4 %$	7.5	-12.5
本文方法	88.9%	80.5%	$30 . 7$	$10 . 7$
注：加粗字体为每列最优的结果。

方法	检测速度/（f·s^-1）
PAN^[15]	4.3
MobileNetV3^[21]+DB^[9]	5.1
本文	$6 . 5$
注：加粗字体为每列最优的结果。

方法	精确率	召回率	检测速度（/ f·s^-1）
无	87.5%	79.2%	25.6
嵌入改进方法1	$88 . 9 %$	$80 . 5 %$	$30 . 7$
嵌入改进方法2	88.1%	78.7%	29.9
注：加粗字体为每列最优的结果。

方法	精确率	召回率	检测速度（/ f·s^-1）
无	87.8%	80.2%	$32 . 3$
特征向量融合模块	$88 . 9 %$	$80 . 5 %$	30.7
注：加粗字体为每列最优的结果。

References 36

[1]	HUANG W L , LIN Z , YANG J C ,et al. Text localization in natural images using stroke feature transform and text covariance descriptors[C]// Proceedings of 2013 IEEE International Conference on Computer Vision. Piscataway:IEEE Press, 2013: 1241-1248.
[2]	NEUMANN L , MATAS J . Real-time lexicon-free scene text localization and recognition[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2016,38(9): 1872-1885.
[3]	MATAS J , CHUM O , URBAN M ,et al. Robust wide-baseline stereo from maximally stable extremal regions[J]. Image and Vision Computing, 2004,22(10): 761-767.
[4]	MINETTO R , THOME N , CORD M ,et al. T-HOG:an effective gradient-based descriptor for single line text regions[J]. Pattern Recognition, 2013,46(3): 1078-1090.
[5]	KRIZHEVSKY A , SUTSKEVER I , HINTON G E . ImageNet classification with deep convolutional neural networks[J]. Communications of the ACM, 2017,60(6): 84-90.
[6]	LIU W , ANGUELOV D , ERHAN D ,et al. SSD:single shot MultiBox detector[M]// ComputerVision–ECCV2016. Cham: Springer International Publishing, 2016: 21-37.
[7]	ZHONG Z Y , SUN L , HUO Q . An anchor-free region proposal network for Faster R-CNN-based text detection approaches[J]. International Journal on Document Analysis and Recognition (IJDAR), 2019,22(3): 315-327.
[8]	HE K M , ZHANG X Y , REN S Q ,et al. Deep residual learning for image recognition[C]// Proceedings of 2016 IEEE Conference on Computer Vision and Pattern Recognition. Piscataway:IEEE Press, 2016: 770-778.
[9]	LIAO M , WAN Z , YAO C ,et al. Real-time scene text detection with differentiable binarization[C]// Proceedings of the AAAI Conference on Artificial Intelligence. Piscataway:IEEE Press, 2020: 11474-11481.
[10]	苏赋, 吕沁, 罗仁泽 . 基于深度学习的图像分类研究综述[J]. 电信科学, 2019,35(11): 58-74.
	SU F , LV Q , LUO R Z . Review of image classification based on deep learning[J]. Telecommunications Science, 2019,35(11): 58-74.
[11]	HAN K , WANG Y H , TIAN Q ,et al. GhostNet:more features from cheap operations[C]// Proceedings of 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). Piscataway:IEEE Press, 2020: 1577-1586.
[12]	HOWARD A G , ZHU M , CHEN B ,et al. Mobilenets:efficient convolutional neural networks for mobile vision applications[J]. arXiv preprint arXiv:1704.04861, 2017.
[13]	ZHANG X Y , ZHOU X Y , LIN M X ,et al. ShuffleNet:an extremely efficient convolutional neural network for mobile devices[C]// Proceedings of 2018 IEEE/CVF Conference on ComputerVision and Pattern Recognition. Piscataway:IEEE Press, 2018: 6848-6856.
[14]	HU J , SHEN L , SUN G . Squeeze-and-excitation networks[C]// Proceedings of 2018 IEEE/CVF Conference on ComputerVision and Pattern Recognition. Piscataway:IEEE Press, 2018: 7132-7141.
[15]	WANG W H , XIE E Z , SONG X G ,et al. Efficient and accurate arbitrary-shaped text detection with pixel aggregation net work[C]// Proceedings of 2019 IEEE/CVF International Conference on Computer Vision (ICCV). Piscataway:IEEE Press, 2019: 8439-8448.
[16]	MILLETARI F , NAVAB N , AHMADI S A . V-net:fully convolutional neural networks for volumetric medical image segmentation[C]// Proceedings of 2016 Fourth International Conference on 3D Vision (3DV). Piscataway:IEEE Press, 2016: 565-571.
[17]	SIMONYAN K , ZISSERMAN A . Very deep convolutional networks for large-scale image recognition[J]. arXiv preprint arXiv:1409.1556, 2014.
[18]	SZEGEDY C , LIU W , JIA Y Q ,et al. Going deeper with convolutions[C]// Proceedings of 2015 IEEE Conference on Computer Vision and Pattern Recognition. Piscataway:IEEE Press, 2015: 1-9.
[19]	IOFFE S , SZEGEDY C . Normalization:accelerating deep network training by reducing internal covariate shift[J]. arXiv preprint arXiv:1502.03167.
[20]	SANDLER M , HOWARD A , ZHU M L ,et al. MobileNetV2:inverted residuals and linear bottlenecks[C]// Proceedings of 2018 IEEE/CVF Conference on ComputerV ision and Pattern Recognition. Piscataway:IEEE Press, 2018: 4510-4520.
[21]	HOWARD A , SANDLER M , CHENB ,et al. Searching for MobileNetV3[C]// Proceedings of 2019 IEEE/CVF International Conference on Computer Vision (ICCV). Piscataway:IEEE Press, 2019: 1314-1324.
[22]	LIN T Y , DOLLáR P , GIRSHICK R ,et al. Feature pyramid networks for object detection[C]// Proceedings of 2017 IEEE Conference on ComputerVision and Pattern Recognition. Piscataway:IEEE Press, 2017: 936-944.
[23]	LIAO M , SHI B , BAI X ,et al. Textboxes:a fast text detector with a single deep neural network[C]// Thirty-first AAAI Conference on Artificial Intelligence. Piscataway:IEEE Press, 2017.
[24]	LIAO M H , SHI B G , BAI X . TextBoxes++:a single-shot oriented scene text detector[J]. IEEE Transactions on Image Processing:a Publication of the IEEE Signal Processing Society, 2018,27(8): 3676-3690.
[25]	TIAN Z , HUANG W , HE T ,et al. Detecting text in natural image with connectionist text proposal network[C]// Proceedings European Conference on Computer Vision. Heidelberg:Springer, 2016: 56-72.
[26]	SHI B G , BAI X , BELONGIE S . Detecting oriented text in natural images by linking segments[C]// Proceedings of 2017 IEEE Conference on Computer Vision and Pattern Recognition. Piscataway:IEEE Press, 2017: 3482-3490.
[27]	LIAO M H , ZHU Z , SHI B G ,et al. Rotation-sensitive regression for oriented scene text detection[C]// Proceedings of 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Piscataway:IEEE Press, 2018: 5909-5918.
[28]	ZHOU X Y , YAO C , WEN H ,et al. EAST:an efficient and accurate scene text detector[C]// Proceedings of 2017 IEEE Conference on Computer Vision and Pattern Recognition. Piscataway:IEEE Press, 2017: 2642-2651.
[29]	DENG D , LIU H , LI X ,et al. Pixellink:detecting scene text via instance segmentation[C]// Proceedings of the AAAI Conference on Artificial Intelligence. Piscataway:IEEE Press, 2018.
[30]	WANG W H , XIE E Z , LI X ,et al. Shape robust text detection with progressive scale expansion network[C]// Proceedings of 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). Piscataway:IEEE Press, 2019 9328-9337.
[31]	TIAN Z T , SHU M , LYU P Y ,et al. Learning shape-aware embedding for scene text detection[C]// Proceedings of 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). Piscataway:IEEE Press, 2019: 4229-4238.
[32]	CHOLLET F , . Xception:deep learning with depthwise separable convolutions[C]// Proceedings of 2017 IEEE Conference on Computer Vision and Pattern Recognition. Piscataway:IEEE Press, 2017: 1800-1807.
[33]	SHI X J , CHEN Z , WANG H ,et al. Convolutional LSTM network:a machine learning approach for precipitation nowcasting[C]// Advances in Neural Information Processing Systems.[S.l.:s.n.], 2015: 802-810.
[34]	LONG S , RUAN J , ZHANG W ,et al. Textsnake:a flexible representation for detecting text of arbitrary shapes[C]// Proceedings of the European Conference on Computer Vision (ECCV). Piscataway:IEEE Press, 2018: 20-36.
[35]	YE J , CHEN Z , LIU J H ,et al. TextFuseNet:scene text detection with richerfused features[C]// Proceedings of the Twenty-Ninth International Joint Conference on Artificial Intelligence. California:International Joint Conferences on ArtificialIntelligence Organization, 2020: 516-522.
[36]	WANG X B , JIANG YY , LUO Z B ,et al. Arbitrary shape scene text detection with adaptive text region representation[C]// Proceedings of 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). Piscataway:IEEE Press, 2019: 6442-6451.

Metrics

Recommended 0

No Suggested Reading articles found!

Feature enhancement and bilinear feature vector fusion for text detection of mobile industrial containers

RichHTML

PDF下载

Knowledge

Abstract

Cite this article

share this article

Figures/Tables 18

References 36

Related Articles 2

Metrics

Recommended 0

[1]	Feng LIN,Liujing XU,Xiaohua CHEN,Weiqiang QI,Ke CHEN,Tiantian ZHU. Method of Webshell detection based on multi-view feature fusion [J]. Telecommunications Science, 2020, 36(6): 125-132.
[2]	Shuainan CUI,Zongju PENG,Wenhui ZOU,Fen CHEN,Hua CHEN. Quality assessment of synthetic viewpoint stereo image with multi-feature fusion [J]. Telecommunications Science, 2019, 35(5): 104-112.