特征增强和双线性特征向量融合的移动端工业货箱文本检测

doi:10.11959/j.issn.1000-0801.2022139

电信科学 ›› 2022, Vol. 38 ›› Issue (7): 75-87.doi: 10.11959/j.issn.1000-0801.2022139

特征增强和双线性特征向量融合的移动端工业货箱文本检测

胡海洋¹^,², 厉泽品¹^,², 李忠金¹^,²

¹ 杭州电子科技大学计算机学院，浙江杭州 310018
² 浙江省脑机协同智能重点实验室，浙江杭州 310018

修回日期:2022-06-10 出版日期:2022-07-20 发布日期:2022-07-01
作者简介:胡海洋（1977- ），男，杭州电子科技大学教授，主要研究方向为机器视觉、智能制造
厉泽品（1997- ），男，杭州电子科技大学硕士生，主要研究方向为计算机视觉、文本检测识别
李忠金（1988- ），男，杭州电子科技大学讲师，主要研究方向为计算机视觉、移动边缘计算
基金资助:
国家自然科学基金资助项目(61572162);国家自然科学基金资助项目(61802095);浙江省重点研发计划项目(2018C01012);浙江省自然科学基金资助项目(LQ17F020003)

Feature enhancement and bilinear feature vector fusion for text detection of mobile industrial containers

Haiyang HU¹^,², Zepin LI¹^,², Zhongjin LI¹^,²

¹ School of Computer Science and Technology, Hangzhou Dianzi University, Hangzhou 310018, China
² Key Laboratory of Brain Machine Collaborative Intelligence of Zhejiang Province, Hangzhou 310018, China

Revised:2022-06-10 Online:2022-07-20 Published:2022-07-01
Supported by:
The National Natural Science Foundation of China(61572162);The National Natural Science Foundation of China(61802095);The Zhejiang Provincial Key Science and Technology Project(2018C01012);The Zhejiang Provincial National Science Foundation of China(LQ17F020003)

摘要/Abstract

摘要：

在实际工业环境下，光线昏暗、文本不规整、设备有限等因素，使得文本检测成为一项具有挑战性的任务。针对此问题，设计了一种基于双线性操作的特征向量融合模块，并联合特征增强与半卷积组成轻量级文本检测网络RGFFD（ResNet18+GhostModule+特征金字塔增强模块（feature pyramid enhancement module， FPEM）+ 特征融合模块（feature fusion module，FFM）+可微分二值化（differenttiable binarization，DB））。其中，Ghost模块内嵌特征增强模块，提升特征提取能力，双线性特征向量融合模块融合多尺度信息，添加自适应阈值分割算法提高DB模块分割能力。在实际工厂环境下，采用嵌入式设备UP2 board对货箱编号进行文本检测，RGFFD检测速度达到6.5 f/s。同时在公共数据集ICDAR2015、Total-text上检测速度分别达到39.6 f/s和49.6 f/s，在自定义数据集上准确率达到88.9%，检测速度为30.7 f/s。

关键词: 文本检测, 半卷积, 特征向量融合, 特征增强, 特征融合

Abstract:

In the real factory environment, due to factors such as dim light, irregular text, and limited equipment, text detection becomes a challenging task.Aiming at this problem, a feature vector fusion module based on bilinear operation was designed and combined with feature enhancement and semi-convolution to form a lightweight text detection network RGFFD (ResNet18 + Ghost Module + FPEM(feature pyramid enhancement module)) + FFM(feature fusion module) + DB (differentiable binarization)).Among them, the Ghost module was embedded with a feature enhancement module to improve the feature extraction capability, the bilinear feature vector fusion module fused multi-scale information, and an adaptive threshold segmentation algorithm was added to improve the segmentation capability of the DB module.In the real industrial environment, the RGFFD detection speed reached 6.5 f/s, when using the embedded device UP2 board for text detection of container numbers.At the same time, the detection speed on the public datasets ICDAR2015 and Total-text reached 39.6 f/s and 49.6 f/s, respectively.The accuracy rate on the custom dataset reached 88.9%, and the detection speed was 30.7 f/s.

Key words: text detection, semi-convolution, feature vector fusion, feature enhancement, feature fusion

中图分类号:

TN929.5

胡海洋, 厉泽品, 李忠金. 特征增强和双线性特征向量融合的移动端工业货箱文本检测[J]. 电信科学, 2022, 38(7): 75-87.

Haiyang HU, Zepin LI, Zhongjin LI. Feature enhancement and bilinear feature vector fusion for text detection of mobile industrial containers[J]. Telecommunications Science, 2022, 38(7): 75-87.

图/表 18

图1

图2

图3

图4

图5

图6

图7

图8

图9

表1

本文方法在ICDAR2015数据集成中与不同文本检测方法的比较"

方法	精确率	召回率	检测速度/（f·s^-1）	检测速度-20/（f·s^-1）
PAN^[15]	85.5%	81.9%	37.5	17.5
EAST^[28]	83.6%	73.5%	13.2	-6.8
SegLink^[26]	73.1%	76.8%	13.9	-6.1
TextSnake^[34]	84.9%	80.4%	1.1	-18.9
TextFuseNet^[35]	$91 . 3 %$	$88 . 9 %$	8.3	-11.7
PSENet^[30]	86.9%	84.5%	1.6	-18.4
CTPN^[25]	74.2%	51.6%	7.1	-12.9
本文方法	85.9%	81.4%	$39 . 6$	$19 . 6$
注：加粗字体为每列最优的结果。

表1

表2

本文方法在Total-text数据集中与不同文本检测方法的比较"

方法	精确率	召回率	检测速度/（f·s^-1）	检测速度-20/（f·s^-1）
PAN^[15]	87.3%	81.5%	39.9	19.9
EAST^[28]	50.1%	36.2%	19.8	-0.2
SegLink^[26]	30.3%	23.8%	9.1	-10.9
TextSnake^[34]	82.7%	74.5%	4.7	-15.3
ATRR^[36]	80.9%	76.2%	25.2	5.2
TextFuseNet^[35]	$87 . 5 %$	$83 . 2 %$	7.1	-12.9
PSENet^[30]	84.8%	79.7%	3.9	-16.1
本文方法	86.9%	78.1%	$45 . 6$	$25 . 6$
注：加粗字体为每列最优的结果。

表2

表3

本文方法在自定义数据集中与不同文本检测方法的比较"

方法	精确率	召回率	检测速度/（f·s^-1）	检测速度-20/（f·s^-1）
PAN^[15]	86.7%	80.6%	25.4	5.4
EAST^[28]	79.2%	61.3%	9.5	-10.5
SegLink^[26]	70.4%	56.8%	4.2	-15.8
TextSnake^[34]	81.8%	75.2%	0.6	-19.4
PSENet^[30]	87.5%	79.4%	2.1	-17.9
CTPN^[25]	65.3%	49.5%	2.5	-17.5
TextFuseNet^[35]	$89 . 5 %$	$81 . 4 %$	7.5	-12.5
本文方法	88.9%	80.5%	$30 . 7$	$10 . 7$
注：加粗字体为每列最优的结果。

表3

表4

开发板中不同方法效果比较"

方法	检测速度/（f·s^-1）
PAN^[15]	4.3
MobileNetV3^[21]+DB^[9]	5.1
本文	$6 . 5$
注：加粗字体为每列最优的结果。

表4

图10

表5

不同嵌入模块的实验结果对比"

方法	精确率	召回率	检测速度（/ f·s^-1）
无	87.5%	79.2%	25.6
嵌入改进方法1	$88 . 9 %$	$80 . 5 %$	$30 . 7$
嵌入改进方法2	88.1%	78.7%	29.9
注：加粗字体为每列最优的结果。

表5

表6

特征向量融合模块效果比较"

方法	精确率	召回率	检测速度（/ f·s^-1）
无	87.8%	80.2%	$32 . 3$
特征向量融合模块	$88 . 9 %$	$80 . 5 %$	30.7
注：加粗字体为每列最优的结果。

表6

图11

图12

参考文献 36

[1]	HUANG W L , LIN Z , YANG J C ,et al. Text localization in natural images using stroke feature transform and text covariance descriptors[C]// Proceedings of 2013 IEEE International Conference on Computer Vision. Piscataway:IEEE Press, 2013: 1241-1248.
[2]	NEUMANN L , MATAS J . Real-time lexicon-free scene text localization and recognition[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2016,38(9): 1872-1885.
[3]	MATAS J , CHUM O , URBAN M ,et al. Robust wide-baseline stereo from maximally stable extremal regions[J]. Image and Vision Computing, 2004,22(10): 761-767.
[4]	MINETTO R , THOME N , CORD M ,et al. T-HOG:an effective gradient-based descriptor for single line text regions[J]. Pattern Recognition, 2013,46(3): 1078-1090.
[5]	KRIZHEVSKY A , SUTSKEVER I , HINTON G E . ImageNet classification with deep convolutional neural networks[J]. Communications of the ACM, 2017,60(6): 84-90.
[6]	LIU W , ANGUELOV D , ERHAN D ,et al. SSD:single shot MultiBox detector[M]// ComputerVision–ECCV2016. Cham: Springer International Publishing, 2016: 21-37.
[7]	ZHONG Z Y , SUN L , HUO Q . An anchor-free region proposal network for Faster R-CNN-based text detection approaches[J]. International Journal on Document Analysis and Recognition (IJDAR), 2019,22(3): 315-327.
[8]	HE K M , ZHANG X Y , REN S Q ,et al. Deep residual learning for image recognition[C]// Proceedings of 2016 IEEE Conference on Computer Vision and Pattern Recognition. Piscataway:IEEE Press, 2016: 770-778.
[9]	LIAO M , WAN Z , YAO C ,et al. Real-time scene text detection with differentiable binarization[C]// Proceedings of the AAAI Conference on Artificial Intelligence. Piscataway:IEEE Press, 2020: 11474-11481.
[10]	苏赋, 吕沁, 罗仁泽 . 基于深度学习的图像分类研究综述[J]. 电信科学, 2019,35(11): 58-74.
	SU F , LV Q , LUO R Z . Review of image classification based on deep learning[J]. Telecommunications Science, 2019,35(11): 58-74.
[11]	HAN K , WANG Y H , TIAN Q ,et al. GhostNet:more features from cheap operations[C]// Proceedings of 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). Piscataway:IEEE Press, 2020: 1577-1586.
[12]	HOWARD A G , ZHU M , CHEN B ,et al. Mobilenets:efficient convolutional neural networks for mobile vision applications[J]. arXiv preprint arXiv:1704.04861, 2017.
[13]	ZHANG X Y , ZHOU X Y , LIN M X ,et al. ShuffleNet:an extremely efficient convolutional neural network for mobile devices[C]// Proceedings of 2018 IEEE/CVF Conference on ComputerVision and Pattern Recognition. Piscataway:IEEE Press, 2018: 6848-6856.
[14]	HU J , SHEN L , SUN G . Squeeze-and-excitation networks[C]// Proceedings of 2018 IEEE/CVF Conference on ComputerVision and Pattern Recognition. Piscataway:IEEE Press, 2018: 7132-7141.
[15]	WANG W H , XIE E Z , SONG X G ,et al. Efficient and accurate arbitrary-shaped text detection with pixel aggregation net work[C]// Proceedings of 2019 IEEE/CVF International Conference on Computer Vision (ICCV). Piscataway:IEEE Press, 2019: 8439-8448.
[16]	MILLETARI F , NAVAB N , AHMADI S A . V-net:fully convolutional neural networks for volumetric medical image segmentation[C]// Proceedings of 2016 Fourth International Conference on 3D Vision (3DV). Piscataway:IEEE Press, 2016: 565-571.
[17]	SIMONYAN K , ZISSERMAN A . Very deep convolutional networks for large-scale image recognition[J]. arXiv preprint arXiv:1409.1556, 2014.
[18]	SZEGEDY C , LIU W , JIA Y Q ,et al. Going deeper with convolutions[C]// Proceedings of 2015 IEEE Conference on Computer Vision and Pattern Recognition. Piscataway:IEEE Press, 2015: 1-9.
[19]	IOFFE S , SZEGEDY C . Normalization:accelerating deep network training by reducing internal covariate shift[J]. arXiv preprint arXiv:1502.03167.
[20]	SANDLER M , HOWARD A , ZHU M L ,et al. MobileNetV2:inverted residuals and linear bottlenecks[C]// Proceedings of 2018 IEEE/CVF Conference on ComputerV ision and Pattern Recognition. Piscataway:IEEE Press, 2018: 4510-4520.
[21]	HOWARD A , SANDLER M , CHENB ,et al. Searching for MobileNetV3[C]// Proceedings of 2019 IEEE/CVF International Conference on Computer Vision (ICCV). Piscataway:IEEE Press, 2019: 1314-1324.
[22]	LIN T Y , DOLLáR P , GIRSHICK R ,et al. Feature pyramid networks for object detection[C]// Proceedings of 2017 IEEE Conference on ComputerVision and Pattern Recognition. Piscataway:IEEE Press, 2017: 936-944.
[23]	LIAO M , SHI B , BAI X ,et al. Textboxes:a fast text detector with a single deep neural network[C]// Thirty-first AAAI Conference on Artificial Intelligence. Piscataway:IEEE Press, 2017.
[24]	LIAO M H , SHI B G , BAI X . TextBoxes++:a single-shot oriented scene text detector[J]. IEEE Transactions on Image Processing:a Publication of the IEEE Signal Processing Society, 2018,27(8): 3676-3690.
[25]	TIAN Z , HUANG W , HE T ,et al. Detecting text in natural image with connectionist text proposal network[C]// Proceedings European Conference on Computer Vision. Heidelberg:Springer, 2016: 56-72.
[26]	SHI B G , BAI X , BELONGIE S . Detecting oriented text in natural images by linking segments[C]// Proceedings of 2017 IEEE Conference on Computer Vision and Pattern Recognition. Piscataway:IEEE Press, 2017: 3482-3490.
[27]	LIAO M H , ZHU Z , SHI B G ,et al. Rotation-sensitive regression for oriented scene text detection[C]// Proceedings of 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Piscataway:IEEE Press, 2018: 5909-5918.
[28]	ZHOU X Y , YAO C , WEN H ,et al. EAST:an efficient and accurate scene text detector[C]// Proceedings of 2017 IEEE Conference on Computer Vision and Pattern Recognition. Piscataway:IEEE Press, 2017: 2642-2651.
[29]	DENG D , LIU H , LI X ,et al. Pixellink:detecting scene text via instance segmentation[C]// Proceedings of the AAAI Conference on Artificial Intelligence. Piscataway:IEEE Press, 2018.
[30]	WANG W H , XIE E Z , LI X ,et al. Shape robust text detection with progressive scale expansion network[C]// Proceedings of 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). Piscataway:IEEE Press, 2019 9328-9337.
[31]	TIAN Z T , SHU M , LYU P Y ,et al. Learning shape-aware embedding for scene text detection[C]// Proceedings of 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). Piscataway:IEEE Press, 2019: 4229-4238.
[32]	CHOLLET F , . Xception:deep learning with depthwise separable convolutions[C]// Proceedings of 2017 IEEE Conference on Computer Vision and Pattern Recognition. Piscataway:IEEE Press, 2017: 1800-1807.
[33]	SHI X J , CHEN Z , WANG H ,et al. Convolutional LSTM network:a machine learning approach for precipitation nowcasting[C]// Advances in Neural Information Processing Systems.[S.l.:s.n.], 2015: 802-810.
[34]	LONG S , RUAN J , ZHANG W ,et al. Textsnake:a flexible representation for detecting text of arbitrary shapes[C]// Proceedings of the European Conference on Computer Vision (ECCV). Piscataway:IEEE Press, 2018: 20-36.
[35]	YE J , CHEN Z , LIU J H ,et al. TextFuseNet:scene text detection with richerfused features[C]// Proceedings of the Twenty-Ninth International Joint Conference on Artificial Intelligence. California:International Joint Conferences on ArtificialIntelligence Organization, 2020: 516-522.
[36]	WANG X B , JIANG YY , LUO Z B ,et al. Arbitrary shape scene text detection with adaptive text region representation[C]// Proceedings of 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). Piscataway:IEEE Press, 2019: 6442-6451.

特征增强和双线性特征向量融合的移动端工业货箱文本检测

Feature enhancement and bilinear feature vector fusion for text detection of mobile industrial containers

在线阅读

PDF下载

可视化

摘要/Abstract

引用本文

使用本文

图/表 18

参考文献 36

相关文章 4

Metrics

推荐阅读 0

[1]	卢敏, 胡娟, 张先超, 丁伟健, 乐光学. 基于用户多特征融合的个性化推荐模型[J]. 电信科学, 2023, 39(5): 101-115.
[2]	林锋,徐柳婧,陈晓华,戚伟强,陈可,朱添田. 一种基于多视角特征融合的Webshell检测方法[J]. 电信科学, 2020, 36(6): 125-132.
[3]	崔帅南,彭宗举,邹文辉,陈芬,陈华. 多特征融合的合成视点立体图像质量评价[J]. 电信科学, 2019, 35(5): 104-112.
[4]	朱宪莹,刘箴,金炜,刘婷婷,刘翠娟,柴艳杰. 基于特征融合的层次结构微博情感分类[J]. 电信科学, 2016, 32(7): 106-114.