电信科学 ›› 2021, Vol. 37 ›› Issue (8): 46-56.doi: 10.11959/j.issn.1000-0801.2021198

• 研究与开发 • 上一篇    下一篇

改进YOLOv4算法的复杂视觉场景行人检测方法

康帅1, 章坚武1, 朱尊杰1, 童国锋2   

  1. 1 杭州电子科技大学,浙江 杭州 310018
    2 绍兴供电公司柯桥供电分公司,浙江 绍兴 330600
  • 修回日期:2021-08-13 出版日期:2021-08-20 发布日期:2021-08-01
  • 作者简介:康帅(1996− ),女,杭州电子科技大学通信工程学院硕士生,主要研究方向为计算机视觉与人工智能等
    章坚武(1961− ),男,博士,杭州电子科技大学通信工程学院教授、博士生导师,主要研究方向为移动通信、多媒体信号处理与人工智能、通信网络与信息安全
    朱尊杰(1994−),男,杭州电子科技大学通信工程学院讲师,主要研究方向为机器人导航定位、场景三维重建、三维模型语义理解与交互等
    童国锋(1968− ),男,绍兴供电公司柯桥供电分公司总经理、高级工程师,主要研究方向为继电保护、配网
  • 基金资助:
    国家自然科学基金资助项目(U1866209);国家自然科学基金资助项目(61772162)

An improved YOLOv4 algorithm for pedestrian detection in complex visual scenes

Shuai KANG1, Jianwu ZHANG1, Zunjie ZHU1, Guofeng TONG2   

  1. 1 Hangzhou Dianzi University, Hangzhou 310018, China
    2 Keqiao Branch, Shaoxing Power Supply Company, Shaoxing 330600, China
  • Revised:2021-08-13 Online:2021-08-20 Published:2021-08-01
  • Supported by:
    The National Natural Science Foundation of China(U1866209);The National Natural Science Foundation of China(61772162)

摘要:

复杂视觉场景下存在过暗或者过曝的光照、恶劣的天气、严重遮挡、行人尺寸差别大以及图像模糊等问题,大大增加了行人检测的难度。因此,针对复杂视觉场景下行人检测准确度低、漏检严重的问题,提出了改进的YOLOv4算法以增强复杂视觉场景下的行人检测效果。首先,构建复杂视觉场景下的行人数据集。然后,在主干网中加入混合空洞卷积,提高网络对行人特征的提取能力。最后,提出空间锯齿空洞卷积结构,代替空间金字塔池化结构,获取更多细节特征。实验表明,在本文构建的行人数据集上,改进后的 YOLOv4算法的平均精度(average precision,AP)达到了90.08%,相比原YOLOv4算法提高了7.2%,对数平均漏检率(log-average miss rate,LAMR)降低了13.69%。

关键词: 复杂视觉场景, YOLOv4, 混合空洞卷积, 空间锯齿空洞卷积

Abstract:

At present, the difficulty of pedestrian detection has been dramatically increased because of some problems, such as the dark or exposed illumination, bad weather, serious occlusion, large difference size of pedestrians and blurred images in complex visual scenes.Therefore, an improved YOLOv4 algorithm was proposed, which improved the detection performance of pedestrian detection in complex visual scenes, aiming at the problems of low accuracy and highly missed detection rate.Firstly, the self-annotation data set pedetrian were constructed.Secondly, the hybrid dilated convolution (HDC) was added into the backbone network to improve the ability of pedestrian feature extraction.Finally, in order to obtain more detailed feature, the spatial jagged dilated convolution (SJDC) structure was proposed to replace the spatial pyramid pooling structure.The experimental results show that the average precision (AP) of the proposed algorithm can achieve 90.08%.The proposed algorithm can substantially improve AP by 7.2%, and the log-average miss rate (LAMR) reduce by 13.69% compared with the original YOLOv4 algorithm.

Key words: complex visual scenes, YOLOv4, hybrid dilated convolution, spatial jagged dilated convolution

中图分类号: 

No Suggested Reading articles found!