语义引导的遮挡行人再识别注意力网络

doi:10.11959/j.issn.1000-436x.2021184

通信学报 ›› 2021, Vol. 42 ›› Issue (10): 106-116.doi: 10.11959/j.issn.1000-436x.2021184

语义引导的遮挡行人再识别注意力网络

任雪娜¹^,², 张冬明¹^,³, 包秀国¹^,³, 李冰⁴

¹ 中国科学院信息工程研究所，北京 100093
² 中国科学院大学网络空间安全学院，北京 100093
³ 国家计算机网络应急技术处理协调中心，北京 100029
⁴ 北京航空航天大学自动化学院，北京 100191

修回日期:2021-04-08 出版日期:2021-10-25 发布日期:2021-10-01
作者简介:任雪娜（1989- ），女，河北石家庄人，中国科学院信息工程研究所博士生，主要研究方向为行人重识别（遮挡行人识别、变装行人识别等）
张冬明（1977- ），男，江苏盐城人，博士，国家计算机网络应急技术处理协调中心研究员、博士生导师，主要研究方向为多媒体内容检索、模式识别、视频编码等
包秀国（1963- ），男，江苏如皋人，博士，国家计算机网络应急技术处理协调中心教授级高级工程师、博士生导师，主要研究方向为网络与信息安全
李冰（1990- ），男，辽宁沈阳人，北京航空航天大学博士生，主要研究方向为压缩视频行为识别
基金资助:
国家重点研发计划基金资助项目(2018YFB0804704);国家自然科学基金资助项目(61672495);国家自然科学基金资助项目(U1736218)

Semantic guidance attention network for occluded person re-identification

Xuena REN¹^,², Dongming ZHANG¹^,³, Xiuguo BAO¹^,³, Bing LI⁴

¹ Institute of Information Engineering, Chinese Academy of Sciences, Beijing 100093, China
² School of Cyber Security, University of Chinese Academy of Sciences, Beijing 100093, China
³ National Computer Network Emergency Response Technical Team/Coordination Center of China, Beijing 100029, China
⁴ School of Aeronautic Science and Engineering, Beijing University of Aeronautics and Astronautics, Beijing 100191, China

Revised:2021-04-08 Online:2021-10-25 Published:2021-10-01
Supported by:
The National Key Research and Development Program of China(2018YFB0804704);The National Natural Science Foundation of China(61672495);The National Natural Science Foundation of China(U1736218)

摘要/Abstract

摘要：

为了解决遮挡场景下行人再识别的特征不对齐、错误匹配的问题，提出了一种语义引导对齐的注意力网络（SGAN）对齐行人的不同部分。SGAN 以行人的语义掩膜作为监督信息，通过全局语义引导和局部语义引导提取行人的全身和局部特征，并根据人体不同部分的可见性动态调整模型训练。在推理阶段，依据注意力模型获得局部区块的可见性，利用共享可见的人体部分的匹配策略自适应地对特征进行相似度的计算。实验结果表明， SGAN能够容忍一定的遮挡，它的准确率不仅在全身数据集上优于大多数先进模型，在2个较大规模的复杂遮挡数据集Occluded-DukeMTMC和P-DukeMTMC-reID上也优于现有的行人再识别方法。

关键词: 深度学习, 遮挡行人再识别, 注意力网络, 语义引导, 特征对齐

Abstract:

To solve the problem of misalignment and mismatch in occluded person Re-ID, SGAN (semantic guided attention network) was proposed.In SGAN, the semantic masks of pedestrians were used as supervision to learn the global and local features through the attention modules, and the training process was dynamically adjusted according to the visibility of local regions.In the inference stage, the part-to-part matching strategy was adopted to adaptively measure visible features based on the feature visibility, which was obtained based on the learned masks from the attention modules.Experimental results show that the average accuracy of SGAN on the holistic datasets is better than most advanced models.Additionally, it is tolerant of occlusions and largely outperforms existing person Re-ID methods on two larger-scale complex occlusion datasets (Occluded-DukeMTMC and P-DukeMTMC-reID).

Key words: deep learning, occluded person re-identification, attention network, semantic guidance, feature alignment

中图分类号:

TN92

任雪娜, 张冬明, 包秀国, 李冰. 语义引导的遮挡行人再识别注意力网络[J]. 通信学报, 2021, 42(10): 106-116.

Xuena REN, Dongming ZHANG, Xiuguo BAO, Bing LI. Semantic guidance attention network for occluded person re-identification[J]. Journal on Communications, 2021, 42(10): 106-116.

图/表 15

图1

图2

图3

图4

图5

表1

表2

Occluded-DukeMTMC数据集上的对比结果"

方法	Rank-1	Rank-5	Rank-10	mAP
Dim^[30]	21.5%	36.1%	42.8%	14.4%
LOMO+XQDA^[31]	8.1%	17%	22.0%	5.0%
PCB^[15]	42.6%	57.1%	62.9%	33.7%
Random Erasing^[32]	40.5%	59.6%	66.8%	30.0%
HACNN^[33]	34.4 %	51.9%	59.4%	26.0%
DSR^[19]	40.8%	58.2%	65.2%	30.4%
SFR^[20]	42.3%	60.3%	67.3 %	32.0%
Part Aligned^[34]	28.8%	44.6%	51.0%	20.2%
FD-GAN^[35]	40.8%	—	—	—
AdverOccluded^[36]	44.5%	—	—	32.2%
Part Bilinear^[37]	36.9%	—	—	—
PGFA^[10]	51.4%	68.6%	74.9%	37.3%
HONet^[25]	55.1%	—	—	43.8%
SGAM^[38]	55.1%	68.7%	74%	35.3%
SGAN	$58 . 0 %$	$72 . 2 %$	$78 . 7 %$	$45 . 4 %$

表2

表3

P-DukeMTMC-reID数据集上的对比结果"

方法	Rank-1	Rank-10	Rank-10	mAP
Teacher-S^[21]	51.4%	50.9%	—	—
PCB^[15]	79.4%	87.1%	91.0%	63.9%
IDE^[9]	82.9%	89.4%	91.5%	65.9%
PVPM^[24]	85.1%	91.3%	93.3%	69.9%
SGAN	$85 . 3 %$	$92 . 6 %$	$94 . 3 %$	$72 . 1 %$

表3

表4

Market-1501和DukeMTMC-reID数据集上的对比结果"

方法	Rank-1	mAP	Rank-1	mAP
BoW+kissme^[8]	44.4%	20.8%	25.1%	12.2%
SVDNe^t[39]	82.3%	62.1%	76.7%	56.8%
PAN^[40]	82.8%	63.4%	71.7%	51.5%
PAR^[34]	81%	63.4%	—	—
DSR^[19]	83.5%	64.2%	—	—
MultiLoss^[41]	83.9%	64.4%	—	—
TripletLoss^[25]	84.9%	69.1%	—	—
Adver occluded^[36]	86.5%	78.3%	79.1%	62.1%
APR^[42]	87%	66.9%	73.9%	55.6%
MultiScale^[43]	88.9%	73.1%	79.2%	60.6%
MLFN^[44]	90%	74.3%	81%	62.8%
PCB^[15]	92.4%	77.3%	81.9%	65.3%
PGFA^[10]	91.2%	76.8%	82.6%	65.5%
VPM^[23]	93%	80.8%	83.6%	72.6%
SGAM^[38]	91.4%	77.6%	83.5%	67.3%
SGAN	$93 . 3 %$	$82 . 3 %$	$85 . 5 %$	$71 . 6 %$

表4

表5

表6

图6

图7

图8

图9

参考文献 45

[1]	GHEISSARI N , SEBASTIAN T B , HARTLEY R . Person reidentification using spatiotemporal appearance[C]// Proceedings of 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'06). Piscataway:IEEE Press, 2006: 1528-1535.
[2]	GRAY D , TAO H . Viewpoint invariant pedestrian recognition with an ensemble of localized features[M]. Berlin: Springer, 2008.
[3]	LOWE D G , . Object recognition from local scale-invariant features[C]// Proceedings of the Seventh IEEE International Conference on Computer Vision. Piscataway:IEEE Press, 1999: 1150-1157.
[4]	RISTANI E , SOLERA F , ZOU R ,et al. Performance measures and a data set for multi-target,multi-camera tracking[C]// European Conference on Computer Vision. Berlin:Springer, 2016: 17-35.
[5]	罗浩, 姜伟, 范星 ,等. 基于深度学习的行人重识别研究进展[J]. 自动化学报, 2019,45(11): 2032-2049.
	LUO H , JIANG W , FAN X ,et al. A survey on deep learning based person Re-identification[J]. Acta Automatica Sinica, 2019,45(11): 2032-2049.
[6]	宋婉茹, 赵晴晴, 陈昌红 ,等. 行人重识别研究综述[J]. 智能系统学报, 2017,12(6): 770-780.
	SONG W R , ZHAO Q Q , CHEN C H ,et al. Survey on pedestrian re-identification research[J]. CAAI Transactions on Intelligent Systems, 2017,12(6): 770-780.
[7]	ZHENG L , YANG Y , HAUPTMANN A G . Person re-identification:past,present and future[J]. arXiv Preprint,arXiv:1610.02984, 2016.
[8]	ZHENG L , SHEN L Y , TIAN L ,et al. Scalable person Re-identification:a benchmark[C]// Proceedings of 2015 IEEE International Conference on Computer Vision (ICCV). Piscataway:IEEE Press, 2015: 1116-1124.
[9]	ZHENG Z D , ZHENG L , YANG Y . Unlabeled samples generated by GAN improve the person Re-identification baseline in vitro[C]// Proceedings of 2017 IEEE International Conference on Computer Vision (ICCV). Piscataway:IEEE Press, 2017: 3774-3782.
[10]	MIAO J X , WU Y , LIU P ,et al. Pose-guided feature alignment for occluded person re-identification[C]// Proceedings of 2019 IEEE/CVF International Conference on Computer Vision (ICCV). Piscataway:IEEE Press, 2019: 542-551.
[11]	ZHUO J X , CHEN Z Y , LAI J H ,et al. Occluded person Re-identification[C]// Proceedings of 2018 IEEE International Conference on Multimedia and Expo (ICME). Piscataway:IEEE Press, 2018: 1-6.
[12]	WU L , SHEN C , HENGEL AV . PersonNet:person re-identification with deep convolutional neural networks[J]. arXiv Preprint,arXiv:1601.0725, 2016.
[13]	QIAN X L , FU Y W , JIANG Y G ,et al. Multi-scale deep learning architectures for person re-identification[C]// Proceedings of 2017 IEEE International Conference on Computer Vision (ICCV). Piscataway:IEEE Press, 2017: 5409-5418.
[14]	VARIOR R R , SHUAI B , LU J W ,et al. A siamese long short-term memory architecture for human Re-identification[M]. Cham: Springer International Publishing, 2016: 135-153.
[15]	SUN Y F , ZHENG L , YANG Y ,et al. Beyond part models:person retrieval with refined part pooling (and a strong convolutional baseline)[C]// European Conference on Computer Vision. Berlin:Springer, 2018: 501-518.
[16]	ZHANG X , LUO H , FAN X ,et al. AlignedReID:surpassing human-level performance in person re-identiflcation[J]. arXiv Preprint,arXiv:1711.08184, 2017.
[17]	ZHAO H Y , TIAN M Q , SUN S Y ,et al. Spindle net:person re-identification with human body region guided feature decomposition and fusion[C]// Proceedings of 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR). Piscataway:IEEE Press, 2017: 907-915.
[18]	ZHENG L , HUANG Y J , LU H C ,et al. Pose-invariant embedding for deep person re-identification[J]. IEEE Transactions on Image Processing, 2019,28(9): 4500-4509.
[19]	HE L X , LIANG J , LI H Q ,et al. Deep spatial feature reconstruction for partial person re-identification:alignment-free approach[C]// Proceedings of 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Piscataway:IEEE Press, 2018: 7073-7082.
[20]	HE L , SUN Z , ZHU Y , WANG Y . Recognizing partial biometric patterns[J]. arXiv Preprint,arXiv:1810.07399, 2018.
[21]	ZHUO J , LAI J , CHEN P . A novel teacher-student learning framework for occluded person re-Identification[J]. arXiv Preprint,arXiv:1907.03253, 2019.
[22]	ZHENG W S , LI X , XIANG T ,et al. Partial person Re-identification[C]// Proceedings of 2015 IEEE International Conference on Computer Vision (ICCV). Piscataway:IEEE Press, 2015: 4678-4686.
[23]	SUN Y F , XU Q , LI Y L ,et al. Perceive where to focus:learning visibility-aware part-level features for partial person re-identification[C]// Proceedings of 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). Piscataway:IEEE Press, 2019: 393-402.
[24]	GAO S , WANG J Y , LU H C ,et al. Pose-guided visible part matching for occluded person ReID[C]// Proceedings of 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). Piscataway:IEEE Press, 2020: 11741-11749.
[25]	WANG G A , YANG S , LIU H Y ,et al. High-order information matters:learning relation and topology for occluded person re-identification[C]// Proceedings of 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). Piscataway:IEEE Press, 2020: 6448-6457.
[26]	KALAYEH M M , BASARAN E , G?KMEN M ,et al. Human semantic parsing for person Re-identification[C]// Proceedings of 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Piscataway:IEEE Press, 2018: 1062-1071.
[27]	HE K M , ZHANG X Y , REN S Q ,et al. Deep residual learning for image recognition[C]// Proceedings of 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR). Piscataway:IEEE Press, 2016: 770-778.
[28]	HERMANS A , BEYER L , LEIBE B . In defense of the triplet loss for person re-identification[J]. arXiv Preprint,arXiv:1703.07737, 2017.
[29]	DENG J , DONG W , SOCHER R ,et al. ImageNet:a large-scale hierarchical image database[C]// Proceedings of 2009 IEEE Conference on Computer Vision and Pattern Recognition. Piscataway:IEEE Press, 2009: 248-255.
[30]	YU Q , CHANG X , SONG Y Z ,et al. The devil is in the middle:exploiting mid-level representations for cross-domain instance matching[J]. arXiv Preprint,arXiv:1711.08106, 2017.
[31]	LIAO S C , HU Y , ZHU X Y ,et al. Person re-identification by local maximal occurrence representation and metric learning[C]// Proceedings of 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR). Piscataway:IEEE Press, 2015: 2197-2206.
[32]	ZHONG Z , ZHENG L , KANG G L ,et al. Random erasing data augmentation[J]. Proceedings of the AAAI Conference on Artificial Intelligence, 2020,34(7): 13001-13008.
[33]	LI W , ZHU X T , GONG S G . Harmonious attention network for person Re-identification[C]// Proceedings of 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Piscataway:IEEE Press, 2018: 2285-2294.
[34]	ZHAO L M , LI X , ZHUANG Y T ,et al. Deeply-learned part-aligned representations for person Re-identification[C]// Proceedings of 2017 IEEE International Conference on Computer Vision (ICCV). Piscataway:IEEE Press, 2017: 3239-3248.
[35]	GE Y X , LI Z W , ZHAO H Y ,et al. FD-GAN:pose-guided feature distilling GAN for robust person Re-identification[J]. arXiv Preprint,arXiv:1810.02936, 2018.
[36]	HUANG H J , LI D W , ZHANG Z ,et al. Adversarially occluded samples for person Re-identification[C]// Proceedings of 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Piscataway:IEEE Press, 2018: 5098-5107.
[37]	SUH Y , WANG J D , TANG S Y ,et al. Part-aligned bilinear representations for person Re-identification[C]// European Conference on Computer Vision. Berlin:Springer, 2018: 418-437.
[38]	YANG Q , WANG P Z , FANG Z H ,et al. Focus on the visible regions:semantic-guided alignment model for occluded person Re-identification[J]. Sensors (Basel,Switzerland), 2020,20(16): 4431.
[39]	SUN Y F , ZHENG L , DENG W J ,et al. SVDNet for pedestrian retrieval[C]// Proceedings of 2017 IEEE International Conference on Computer Vision (ICCV). Piscataway:IEEE Press, 2017: 3820-3828.
[40]	ZHENG Z D , ZHENG L , YANG Y . Pedestrian alignment network for large-scale person re-identification[J]. IEEE Transactions on Circuits and Systems for Video Technology, 2019,29(10): 3037-3045.
[41]	LI W , ZHU X , GONG S . Person re-identification by deep joint learning of multi-loss classification[J]. arXiv Preprint,arXiv:1705.04724, 2017.
[42]	LIN Y T , ZHENG L , ZHENG Z D ,et al. Improving person re-identification by attribute and identity learning[J]. Pattern Recognition, 2019,95: 151-161.
[43]	CHEN Y B , ZHU X T , GONG S G . Person Re-identification by deep learning multi-scale representations[C]// Proceedings of 2017 IEEE International Conference on Computer Vision Workshops (ICCVW). Piscataway:IEEE Press, 2017: 2590-2600.
[44]	CHANG X B , HOSPEDALES T M , XIANG T . Multi-level factorisation net for person Re-identification[C]// Proceedings of 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Piscataway:IEEE Press, 2018: 2109-2118.
[45]	SELVARAJU R R , COGSWELL M , DAS A ,et al. Grad-CAM:visual explanations from deep networks via gradient-based localization[J]. International Journal of Computer Vision, 2020,128(2): 336-359.

部位	可见性		sim_l	sim_l+sin_g	Sim(q, g)
部位	q	g	sim_l	sim_l+sin_g	Sim(q, g)
全局	1	1	1.963 3
头部	1	1	0.025 9
上半身	1	1	0.257 9	2.360 1	2.042 7
下半身	1	1	0.112 9
脚部	0	1	0

方法	Rank-1	Rank-5	Rank-1	mAP
B	54.1%	70.1%	77.2%	43.1%
G	55.9%	72.4%	79.3%	45.9%
L	56.6%	71.7%	76.2%	42.4%
G+L	58.0%	72.2%	78.7%	45.4%

方法	Rank-1	Rank-3	Rank-5	mAP
C+A	56.6%	73.6%	78.3%	44.7%
w×C+A	56.8%	72.6%	78.1%	43.6%
w×(C+A)	58.0%	72.2%	78.7%	45.4%

语义引导的遮挡行人再识别注意力网络

Semantic guidance attention network for occluded person re-identification

在线阅读

PDF下载

可视化

摘要/Abstract

引用本文

使用本文

图/表 15

参考文献 45

相关文章 15

Metrics

推荐阅读 0

[1]	陈东昱, 陈华, 范丽敏, 付一方, 王舰. 基于深度学习的随机性检验策略研究[J]. 通信学报, 2023, 44(6): 23-33.
[2]	李荣鹏, 汪丙炎, 张宏纲, 赵志峰. 知识增强的语义通信接收端设计[J]. 通信学报, 2023, 44(6): 70-76.
[3]	马帅, 裴科, 祁华艳, 李航, 曹雯, 王洪梅, 熊海良, 李世银. 基于生成模型的地磁室内高精度定位算法研究[J]. 通信学报, 2023, 44(6): 211-222.
[4]	杨洁, 董标, 付雪, 王禹, 桂冠. 基于轻量化分布式学习的自动调制分类方法[J]. 通信学报, 2022, 43(7): 134-142.
[5]	杨秀璋, 彭国军, 李子川, 吕杨琦, 刘思德, 李晨光. 基于Bert和BiLSTM-CRF的APT攻击实体识别及对齐研究[J]. 通信学报, 2022, 43(6): 58-70.
[6]	廖勇, 王世义. 高速移动环境下基于RM-Net的大规模MIMO CSI反馈算法[J]. 通信学报, 2022, 43(5): 166-176.
[7]	廖育荣, 王海宁, 林存宝, 李阳, 方宇强, 倪淑燕. 基于深度学习的光学遥感图像目标检测研究进展[J]. 通信学报, 2022, 43(5): 190-203.
[8]	赵增华, 童跃凡, 崔佳洋. 基于域自适应的Wi-Fi指纹设备无关室内定位模型[J]. 通信学报, 2022, 43(4): 143-153.
[9]	廖勇, 程港, 李玉杰. 基于深度展开的大规模MIMO系统CSI反馈算法[J]. 通信学报, 2022, 43(12): 77-88.
[10]	段雪源, 付钰, 王坤, 李彬. 基于简单统计特征的LDoS攻击检测方法[J]. 通信学报, 2022, 43(11): 53-64.
[11]	霍俊彦, 邱瑞鹏, 马彦卓, 杨付正. 基于最邻近帧质量增强的视频编码参考帧列表优化算法[J]. 通信学报, 2022, 43(11): 136-147.
[12]	康海燕, 冀源蕊. 基于本地化差分隐私的联邦学习方法研究[J]. 通信学报, 2022, 43(10): 94-105.
[13]	张红霞, 王琪, 王登岳, 王奔. 基于深度学习的区块链蜜罐陷阱合约检测[J]. 通信学报, 2022, 43(1): 194-202.
[14]	晏燕, 丛一鸣, Adnan Mahmood, 盛权政. 基于深度学习的位置大数据统计发布与隐私保护方法[J]. 通信学报, 2022, 43(1): 203-216.
[15]	朱叶, 余宜林, 郭迎春. HRDA-Net：面向真实场景的图像多篡改检测与定位算法[J]. 通信学报, 2022, 43(1): 217-226.