Rapider-YOLOX：高效的轻量级目标检测网络

doi:10.11959/j.issn.2096-6652.202303

Abstract

Abstract:

As a lightweight network structure, YOLOX-Nano has the advantage of fast running speed.However, the model still has the defects of weak feature extraction ability and insufficient detection accuracy in practical application.Therefore, an efficient object detection network Rapider-YOLOX which comprehensively balanced the detection speed and detection accuracy was proposed.Firstly, the highly efficient bottleneck module was designed to improve the feature extraction capability of depthwise convolutional blocks in the original YOLOX-Nano model.Secondly, the soft-SPP module was designed to avoid the loss of some important information in the original SPP module and improve the ability of multi-scale information fusion and information exchange between channels further.Finally, CIoU was introduced to improve the position accuracy of the prediction box by using the center distance and aspect ratio between the prediction box and the real box.The experimental results on PASCAL VOC2007 dataset showed that the mAP of Rapider-YOLOX model reached 77.92%, which was 3.79% higher than the original YOLOX-Nano.In addition, on GT1030 with only 384 CUDA cores, the FPS of the proposed method could reach 45.40.The FPS could also reach 23.94 on the CPU, which further improved detection accuracy and generalization performance of the network while ensuring the lightweight characteristics of the network.

Key words: object detection, efficient convolutional neural network, YOLOX-Nano, lightweight, high precision

CLC Number:

TP391

Zhouyu GU,Yuecheng YU,Tiantian Zhe. Rapider-YOLOX: lightweight object detection network with high precision[J]. Chinese Journal of Intelligent Science and Technology, 2023, 5(1): 92-103.

Figures/Tables 16

References 43

[1]	VOULODIMOS A , DOULAMIS N , DOULAMIS A ,et al. Deep learning for computer vision:a brief review[J]. Computational Intelligence and Neuroscience, 2018: 1-13.
[2]	张红民, 李萍萍, 房晓冰 ,等. 改进 YOLOv3 网络模型的人体异常行为检测方法[J]. 计算机科学, 2022,49(4): 233-238.
	ZHANG H M , LI P P , FANG X B ,et al. Human abnormal behavior detection method based on improved YOLOv3 network model[J]. Computer Science, 2022,49(4): 233-238.
[3]	田庆, 胡蓉, 李佐勇 ,等. 基于 SE-YOLOv5s 的绝缘子检测[J]. 智能科学与技术学报, 2021,3(3): 312-321.
	TIAN Q , HU R , LI Z Y ,et al. Insulator detection based on SE-YOLOv5s[J]. Chinese Journal of Intelligent Science and Technology, 2021,3(3): 312-321.
[4]	VIOLA P , JONES M . Robust real-time face detection[C]// Proceedings of the 8th IEEE International Conference on Computer Vision. Piscataway:IEEE Press, 2002.
[5]	VEDALDI A , GULSHAN V , VARMA M ,et al. Multiple kernels for object detection[C]// Proceedings of 2009 IEEE 12th International Conference on Computer Vision. Piscataway:IEEE Press, 2010: 606-613.
[6]	HARZALLAH H , JURIE F , SCHMID C . Combining efficient object localization and image classification[C]// Proceedings of 2009 IEEE 12th International Conference on Computer Vision. Piscataway:IEEE Press, 2010: 237-244.
[7]	OJALA T , PIETIKAINEN M , MAENPAA T . Multiresolution gray-scale and rotation invariant texture classification with local binary patterns[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2002,24(7): 971-987.
[8]	GIRSHICK R , DONAHUE J , DARRELL T ,et al. Rich feature hierarchies for accurate object detection and semantic segmentation[C]// Proceedings of 2014 IEEE Conference on Computer Vision and Pattern Recognition. Piscataway:IEEE Press, 2014: 580-587.
[9]	GIRSHICK R , . Fast R-CNN[C]// Proceedings of 2015 IEEE International Conference on Computer Vision. Piscataway:IEEE Press, 2016: 1440-1448.
[10]	REN S Q , HE K M , GIRSHICK R ,et al. Faster R-CNN:towards real-time object detection with region proposal networks[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2017,39(6): 1137-1149.
[11]	HE K M , GKIOXARI G , DOLLáR P ,et al. Mask R-CNN[C]// Proceedings of 2017 IEEE International Conference on Computer Vision. Piscataway:IEEE Press, 2017: 2980-2988.
[12]	DAI J F , LI Y , HE K M ,et al. R-FCN:object detection via region-based fully convolutional networks[C]// Proceedings of the 30th International Conference on Neural Information Processing Systems. New York:ACM Press, 2016: 379-387.
[13]	ZHU Y S , ZHAO C Y , WANG J Q ,et al. CoupleNet:coupling global structure with local parts for object detection[C]// Proceedings of 2017 IEEE International Conference on Computer Vision. Piscataway:IEEE Press, 2017: 4146-4154.
[14]	REDMON J , DIVVALA S , GIRSHICK R ,et al. You only look once:unified,real-time object detection[C]// Proceedings of 2016 IEEE Conference on Computer Vision and Pattern Recognition. Piscataway:IEEE Press, 2016: 779-788.
[15]	REDMON J , FARHADI A . YOLO9000:better,faster,stronger[C]// Proceedings of 2017 IEEE Conference on Computer Vision and Pattern Recognition. Piscataway:IEEE Press, 2017: 6517-6525.
[16]	REDMON J , FARHADI A . YOLOv3:an incremental improvement[J]. arXiv preprint, 2018,arXiv:1804.02767.
[17]	BOCHKOVSKIY A , WANG C Y , LIAO H Y M . YOLOv4:optimal speed and accuracy of object detection[J]. arXiv preprint, 2020,arXiv:2004.10934.
[18]	LIU W , ANGUELOV D , ERHAN D ,et al. SSD:single shot multibox detector[C]// Proceedings of European Conference on Computer Vision. Cham:Springer, 2016: 21-37.
[19]	LIN T Y , GOYAL P , GIRSHICK R ,et al. Focal loss for dense object detection[C]// Proceedings of 2017 IEEE International Conference on Computer Vision. Piscataway:IEEE Press, 2017: 2999-3007.
[20]	GE Z , LIU S T , WANG F ,et al. YOLOX:exceeding YOLO series in 2021[J]. arXiv preprint, 2021,arXiv:2107.08430.
[21]	LIU Z , MAO H Z , WU C Y ,et al. A ConvNet for the 2020s[C]// Proceedings of 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Piscataway:IEEE Press, 2022: 11966-11976.
[22]	RADOSAVOVIC I , KOSARAJU R P , GIRSHICK R ,et al. Designing network design spaces[C]// Proceedings of 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Piscataway:IEEE Press, 2020: 10425-10433.
[23]	STERGIOU A , POPPE R , KALLIATAKIS G . Refining activation downsampling with SoftPool[C]// Proceedings of 2021 IEEE/CVF International Conference on Computer Vision. Piscataway:IEEE Press, 2022: 10337-10346.
[24]	WANG Q L , WU B G , ZHU P F ,et al. ECA-net:efficient channel attention for deep convolutional neural networks[C]// Proceedings of 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Piscataway:IEEE Press, 2020: 11531-11539.
[25]	YU J H , JIANG Y N , WANG Z Y ,et al. UnitBox:an advanced object detection network[C]// Proceedings of the 24th ACM International Conference on Multimedia. New York:ACM Press, 2016: 516-520.
[26]	LIU Z , LI J G , SHEN Z Q ,et al. Learning efficient convolutional networks through network slimming[C]// Proceedings of 2017 IEEE International Conference on Computer Vision. Piscataway:IEEE Press, 2017: 2755-2763.
[27]	ZHANG D Q , YANG J L , YE D ,et al. LQ-nets:learned quantization for highly accurate and compact deep neural networks[C]// Proceedings of 2018 European Conference on Computer Vision. New York:ACM Press, 2018: 373-390.
[28]	HINTON G , VINYALS O , DEAN J . Distilling the knowledge in a neural network[J]. arXiv preprint, 2015,arXiv:1503.02531.
[29]	HOWARD A G , ZHU M L , CHEN B ,et al. MobileNets:efficient convolutional neural networks for mobile vision applications[J]. arXiv preprint, 2017,arXiv:1704.04861.
[30]	SANDLER M , HOWARD A , ZHU M L ,et al. MobileNetV2:inverted residuals and linear bottlenecks[C]// Proceedings of 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Piscataway:IEEE Press, 2018: 4510-4520.
[31]	HOWARD A , SANDLER M , CHEN B ,et al. Searching for MobileNetV3[C]// Proceedings of 2019 IEEE/CVF International Conference on Computer Vision. Piscataway:IEEE Press, 2020: 1314-1324.
[32]	ZHANG X Y , ZHOU X Y , LIN M X ,et al. ShuffleNet:an extremely efficient convolutional neural network for mobile devices[C]// Proceedings of 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Piscataway:IEEE Press, 2018: 6848-6856.
[33]	MA N N , ZHANG X Y , ZHENG H T ,et al. ShuffleNet V2:practical guidelines for efficient CNN architecture design[C]// Proceedings of 2018 European Conference on Computer Vision. New York:ACM Press, 2018: 122-138.
[34]	HAN K , WANG Y H , TIAN Q ,et al. GhostNet:more features from cheap operations[C]// Proceedings of 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Piscataway:IEEE Press, 2020: 1577-1586.
[35]	WOMG A , SHAFIEE M J , LI F ,et al. Tiny SSD:a tiny single-shot detection deep convolutional neural network for real-time embedded object detection[C]// Proceedings of 2018 15th Conference on Computer and Robot Vision. Piscataway:IEEE Press, 2018: 95-101.
[36]	IANDOLA F N , HAN S , MOSKEWICZ M W ,et al. SqueezeNet:AlexNet-level accuracy with 50x fewer parameters and ＜0.5 MB model size[J]. arXiv preprint, 2016,arXiv:1602.07360.
[37]	FANG W , WANG L , REN P M . Tinier-YOLO:a real-time object detection method for constrained environments[J]. IEEE Access, 2019,8: 1935-1944.
[38]	DOSOVITSKIY A , BEYER L , KOLESNIKOV A ,et al. An image is worth 16×16 words:transformers for image recognition at scale[J]. arXiv preprint, 2020,arXiv:2010.11929.
[39]	LI Y T , HUANG H S , XIE Q S ,et al. Research on a surface defect detection algorithm based on MobileNet-SSD[J]. Applied Sciences, 2018,8(9): 1678.
[40]	WONG A , FAMUORI M , SHAFIEE M J ,et al. YOLO nano:a highly compact you only look once convolutional neural network for object detection[C]// Proceedings of 2019 5th Workshop on Energy Efficient Machine Learning and Cognitive Computing - NeurIPS Edition. Piscataway:IEEE Press, 2021: 22-25.
[41]	YU G H , CHANG Q Y , LYU W Y ,et al. PP-PicoDet:a better real-time object detector on mobile devices[J]. arXiv preprint, 2021,arXiv:2111.00902.
[42]	HUANG X , WANG X X , LYU W Y ,et al. PP-YOLOv2:a practical object detector[J]. arXiv preprint, 2021,arXiv:2104.10419.
[43]	XU S L , WANG X X , LYU W Y ,et al. PP-YOLOE:an evolved version of YOLO[J]. arXiv preprint, 2022,arXiv:2203.16250.

Metrics

Recommended 0

No Suggested Reading articles found!

设备	型号
操作系统	Windows 10
处理器	Intel(R) Xeon(R) Silver 4214 CPU @ 2.20 GHz
显卡（训练）	GeForce GTX 2080TI
显卡（测试）	GeForce GT 1030
深度学习框架	Pytorch1.9.0
库版本	CUDA11.6; CUDNN7.4.1

类别	YOLOX-Nano				Rapider-YOLOX
类别	P	R	AP	F1	P	R	AP	F1
aeroplane	87.27%	80.00%	87.64%	0.83	89.38%	84.17%	90.78%	0.87
bicycle	90.68%	75.35%	82.65%	0.82	90.16%	77.46%	84.68%	0.83
bird	88.98%	64.02%	76.07%	0.74	89.76%	69.51%	80.47%	0.78
boat	77.27%	46.79%	61.75%	0.58	77.03%	52.29%	67.75%	0.62
bottle	77.14%	36.82%	49.12%	0.50	80.51%	43.18%	57.99%	0.56
bus	84.55%	75.00%	86.30%	0.79	89.29%	80.65%	89.08%	0.85
car	87.82%	69.98%	81.84%	0.78	88.40%	72.23%	84.06%	0.80
cat	81.25%	78.14%	81.56%	0.80	85.12%	78.14%	85.99%	0.81
chair	77.73%	39.90%	55.84%	0.53	82.74%	45.50%	61.47%	0.59
cow	71.64%	60.76%	63.15%	0.66	77.27%	64.56%	78.41%	0.70
dining table	73.97%	50.00%	56.98%	0.60	73.24%	48.15%	57.40%	0.58
dog	86.45%	63.36%	79.36%	0.73	86.49%	65.75%	82.25%	0.75
horse	88.50%	75.19%	87.64%	0.81	90.68%	80.45%	89.13%	0.85
motorbike	92.71%	68.99%	79.32%	0.79	93.88%	71.32%	84.15%	0.81
person	90.56%	72.53%	84.47%	0.81	91.08%	74.84%	86.16%	0.82
pottedplant	75.26%	39.89%	51.51%	0.52	86.67%	42.62%	58.74%	0.57
sheep	81.65%	70.49%	83.46%	0.76	86.71%	74.86%	85.97%	0.80
sofa	81.48%	64.71%	72.63%	0.72	73.68%	54.90%	69.85%	0.63
train	8627%	72.73%	80.81%	0.79	91.00%	75.21%	82.60%	0.82
tvmonitor	86.79%	69.17%	80.52%	0.77	82.50%	74.44%	81.46%	0.78

推理显卡	内存容量/GB	CUDA 核心数量
GT1030	2	384
Titan X	12	3 584
Tesla V100	16	5 120

网络模型	模型参数量/MB	模型复杂度/BFLOPS	平均检测精度	推理时延/ms	推理显卡型号
Tiny YOLOv2	60.5	6.97	57.1%	45.26	Tesla V100
Tiny YOLOv3	33.4	5.47	58.4%	34.30	Tesla V100
MobileNet-SSD^[39]	22.2	4.29	72.7%	—	—
Tinier-YOLO	8.9	2.56	65.7%	39.84	Titan X
YOLO-Nano^[40]	4.0	4.51	69.1%	29.74	Tesla V100
YOLOv5n	1.9	2.09	73.9%	27.62	GT1030
PP-PicoDet-S^[41]	0.99	1.24	77.7%	—	GT1030
PPYOLO-Tiny^[42]	4.20	4.95	71.3%	29.93	GT1030
PP-YOLOE-S^[43]	7.90	7.56	78.4%	42.76	GT1030
YOLOX-Nano	0.91	1.07	74.1%	21.85	GT1030
Rapider-YOLOX	0.96	1.10	77.9%	22.02	GT1030

网络模型	推理延迟/ms	FPS	推理硬件型号
Rapider-YOLOX	41.80	23.94	i5-12600KF
Tiny YOLOv2	45.26	22.09	Tesla V100

Rapider-YOLOX: lightweight object detection network with high precision

RichHTML

PDF下载

Knowledge

Abstract

Cite this article

share this article

Figures/Tables 16

References 43

Related Articles 5

Metrics

Recommended 0

网络模型	mAP	FPS
YOLOX-Nano	74.13%	45.76
YOLOX-Nano +HEB	76.12%	45.16
YOLOX-Nano +SSPP	75.54%	45.96
YOLOX-Nano +CIoU	75.12%	—
Papider-YOLOX	77.92%	45.40

[1]	Zhe HUANG, Yongcai WANG, Deying LI. A survey of 3D object detection algorithms [J]. Chinese Journal of Intelligent Science and Technology, 2023, 5(1): 7-31.
[2]	Ying LI, Long CHEN, Zhaohong HUANG, Yang SUN, Guorong CAI. Plant leaf detection technology based on multi-scale CNN feature fusion [J]. Chinese Journal of Intelligent Science and Technology, 2021, 3(3): 304-311.
[3]	Qing TIAN, Rong HU, Zuoyong LI, Yuanzheng CAI, Zhaochai YU. Insulator detection based on SE-YOLOv5s [J]. Chinese Journal of Intelligent Science and Technology, 2021, 3(3): 312-321.
[4]	Sijia TIAN,Qiang GU,Rong HU,Ruige LI,Dingxin HE. A robot sorting method based on deep learning [J]. Chinese Journal of Intelligent Science and Technology, 2020, 2(3): 268-274.
[5]	Xiang ZHANG,Hongwei LIU,Zhuoqun LIU,Zhenguo YAN,Xiaoqian CHEN,Yiyong HUANG. Dynamical modeling and intelligent control of space soft manipulator [J]. Chinese Journal of Intelligent Science and Technology, 2019, 1(1): 52-61.