基于对抗补丁的可泛化的Grad-CAM攻击方法

doi:10.11959/j.issn.1000-436x.2021025

通信学报 ›› 2021, Vol. 42 ›› Issue (3): 23-35.doi: 10.11959/j.issn.1000-436x.2021025

基于对抗补丁的可泛化的Grad-CAM攻击方法

司念文¹, 张文林¹, 屈丹¹, 常禾雨², 李盛祥¹, 牛铜¹

¹ 信息工程大学信息系统工程学院，河南郑州 450001
² 信息工程大学密码工程学院，河南郑州 450001

修回日期:2020-12-22 出版日期:2021-03-25 发布日期:2021-03-01
作者简介:司念文（1992- ），男，湖北襄阳人，信息工程大学博士生，主要研究方向为深度学习的安全性与可解释性。
张文林（1982- ），男，湖北黄冈人，博士，信息工程大学副教授、硕士生导师，主要研究方向为语音信号处理、语音识别、机器学习等。
屈丹（1974- ），女，吉林九台人，博士，信息工程大学教授、博士生导师，主要研究方向为语音识别、智能信息处理、机器学习等。
常禾雨（1993- ），女，河南郑州人，信息工程大学博士生，主要研究方向为深度学习与行人检测、行人重识别。
李盛祥（1991- ），男，湖南邵阳人，信息工程大学博士生，主要研究方向为多智能体强化学习。
牛铜（1984- ），男，河南郑州人，信息工程大学副教授，主要研究方向为语音增强、语音识别、深度学习等。
基金资助:
国家自然科学基金资助项目(61673395)

Generalized Grad-CAM attacking method based on adversarial patch

Nianwen SI¹, Wenlin ZHANG¹, Dan QU¹, Heyu CHANG², Shengxiang LI¹, Tong NIU¹

¹ Department of Information System Engineering, Information Engineering University, Zhengzhou 450001, China
² Department of Cryptogram Engineering, Information Engineering University, Zhengzhou 450001, China

Revised:2020-12-22 Online:2021-03-25 Published:2021-03-01
Supported by:
The National Natural Science Foundation of China(61673395)

摘要/Abstract

摘要：

为了验证Grad-CAM解释方法的脆弱性，提出了一种基于对抗补丁的Grad-CAM攻击方法。通过在CNN分类损失函数后添加对Grad-CAM类激活图的约束项，可以针对性地优化出一个对抗补丁并合成对抗图像。该对抗图像可在分类结果保持不变的情况下，使Grad-CAM解释结果偏向补丁区域，实现对解释结果的攻击。同时，通过在数据集上的批次训练及增加扰动范数约束，提升了对抗补丁的泛化性和多场景可用性。在ILSVRC2012数据集上的实验结果表明，与现有方法相比，所提方法能够在保持模型分类精度的同时，更简单有效地攻击Grad-CAM解释结果。

关键词: 卷积神经网络, 可解释性, 对抗补丁, 类激活图, 显著图

Abstract:

To verify the fragility of the Grad-CAM, a Grad-CAM attack method based on adversarial patch was proposed.By adding a constraint to the Grad-CAM in the classification loss function, an adversarial patch could be optimized and the adversarial image could be synthesized.The adversarial image guided the Grad-CAM interpretation result towards the patch area while the classification result remains unchanged, so as to attack the interpretations.Meanwhile, through batch-training on the dataset and increasing perturbation norm constraint, the generalization and the multi-scene usability of the adversarial patch were improved.Experimental results on the ILSVRC2012 dataset show that compared with the existing methods, the proposed method can attack the interpretation results of the Grad-CAM more simply and effectively while maintaining the classification accuracy.

Key words: convolutional neural network, interpretability, adversarial patch, class activation map, saliency map

中图分类号:

TP391

司念文, 张文林, 屈丹, 常禾雨, 李盛祥, 牛铜. 基于对抗补丁的可泛化的Grad-CAM攻击方法[J]. 通信学报, 2021, 42(3): 23-35.

Nianwen SI, Wenlin ZHANG, Dan QU, Heyu CHANG, Shengxiang LI, Tong NIU. Generalized Grad-CAM attacking method based on adversarial patch[J]. Journal on Communications, 2021, 42(3): 23-35.

图/表 11

图1

图2

表1

图3

表2

图4

图5

表3

图6

表4

图7

参考文献 25

[1]	SIMONYAN K , ZISSERMAN A . Very deep convolutional networks for large-scale image recognition[J]. arXiv Preprint,arXiv:1409.1556v6, 2014.
[2]	HE K M , ZHANG X Y , REN S Q ,et al. Deep residual learning for image recognition[C]// IEEE Conference on Computer Vision and Pattern Recognition. Piscataway:IEEE Press, 2016: 770-778.
[3]	HUANG G , LIU Z , MAATEN L V D ,et al. Densely connected convolutional networks[C]// IEEE Conference on Computer Vision and Pattern Recognition. Piscataway:IEEE Press, 2017: 2261-2269.
[4]	VASWANI A , SHAZEER N , PARMAR N ,et al. Attention is all you need[J]. arXiv Preprint,arXiv:1706.03762v5, 2017.
[5]	DEVLIN J , CHANG M W , LEE K ,et al. Bert:pre-training of deep bidirectional transformers for language understanding[J]. arXiv Preprint,arXiv:1810.04805, 2018.
[6]	SIMONYAN K , VEDALDI A , ZISSERMAN A . Deep inside convolutional networks:visualising image classification models and saliency maps[J]. arXiv Preprint,arXiv:1312.6034, 2013.
[7]	SPRINGENBERG J T , DOSOVITSKIY A , BROX T ,et al. Striving for simplicity:the all convolutional net[J]. arXiv Preprint,arXiv:1412.6806, 2014.
[8]	SMILKOV D , THORAT N , KIM B ,et al. SmoothGrad:removing noise by adding noise[J]. arXiv Preprint,arXiv:1706.03825, 2017.
[9]	SUNDARARAJAN M , TALY A , YAN Q Q . Axiomatic attribution for deep networks[J]. arXiv Preprint,arXiv:1703.01365, 2017.
[10]	ZHOU B , KHOSLA A , LAPEDRIZA A ,et al. Learning deep features for discriminative localization[C]// IEEE Conference on Computer Vision and Pattern Recognition. Piscataway:IEEE Press, 2016: 2921-2929.
[11]	SELVARAJU R R , COGSWELL M , DAS A ,et al. Grad-CAM:visual explanations from deep networks via gradient-based localization[C]// IEEE Conference on Computer Vision and Pattern Recognition. Piscataway:IEEE Press, 2017: 618-626.
[12]	CHATTOPADHAY A , SARKAR A , HOWLADER P ,et al. Grad-CAM++:generalized gradient-based visual explanations for deep convolutional networks[C]// 2018 IEEE Winter Conference on Applications of Computer Vision. Piscataway:IEEE Press, 2018: 839-847.
[13]	WANG H F , DU M N , YANG F ,et al. Score-CAM:improved visual explanations via score-weighted class activation mapping[J]. arXiv Preprint,arXiv:1910.01279, 2019.
[14]	GHORBANI A , ABID A , ZOU J . Interpretation of neural networks is fragile[C]// Proceedings of the AAAI Conference on Artificial Intelligence. Palo Alto:AAAI Press, 2018: 3681-3688.
[15]	DOMBROWSKI A K , ALBER M , ANDERS C ,et al. Explanations can be manipulated and geometry is to blame[J]. arXiv Preprint,arXiv:1906.07983, 2019.
[16]	HEO J , JOO S , MOON T . Fooling neural network interpretations via adversarial model manipulation[J]. arXiv Preprint,arXiv:1902.02041, 2019.
[17]	BROWN T B , MANé D , ROY A ,et al. Adversarial patch[J]. arXiv Preprint,arXiv:1712.09665v2, 2017.
[18]	RUSSAKOVSKY O , DENG J , SU H ,et al. ImageNet large scale visual recognition challenge[J]. International Journal of Computer Vision, 2015,115(3): 211-252.
[19]	FUKUI H , HIRAKAWA T , YAMASHITA T ,et al. Attention branch network:learning of attention mechanism for visual explanation[C]// IEEE Conference on Computer Vision and Pattern Recognition. Piscataway:IEEE Press, 2019: 10705-10714.
[20]	LI K P , WU Z Y , PENG K C ,et al. Tell me where to look:guided attention inference network[C]// IEEE Conference on Computer Vision and Pattern Recognition. Piscataway:IEEE Press, 2018: 9215-9223.
[21]	SUBRAMANYA A , PILLAI V , PIRSIAVASH H . Fooling network interpretation in image classification[C]// IEEE Conference on Computer Vision and Pattern Recognition. Piscataway:IEEE Press, 2019: 2020-2029.
[22]	SZEGEDY C , ZAREMBA W , SUTSKEVER I ,et al. Intriguing properties of neural networks[J]. arXiv Preprint,arXiv:1312.6199v4, 2013.
[23]	GOODFELLOW I J , SHLENS J , SZEGEDY C . Explaining and harnessing adversarial examples[J]. arXiv Preprint,arXiv:1412.6572v3, 2014.
[24]	PASZKE A , GROSS S , CHINTALA S ,et al. Automatic differentiation in PyTorch[C]// Advances in Neural Information Processing Systems Workshop. Massachusetts:MIT Press, 2017: 1-4.
[25]	DONG Y P , LIAO F Z , PANG T Y ,et al. Boosting adversarial attacks with momentum[C]// IEEE Conference on Computer Vision and Pattern Recognition. Piscataway:IEEE Press, 2018: 9185-9193.

方法	top1准确率	ER_p	ER_b
原图	92.70%	4.85%	67.76%
对抗性微调方法	87.90%	71.13% ↑	35.42% ↓
对抗补丁方法（本文方法）	92.50%	67.19% ↑	38.52% ↓

模型	方法	top1准确率	ER_p	ER_b
VGGNet-16	原图	90.60%	5.67%	62.05%
	对抗图像（本文方法）	89.30%	64.94% ↑	40.19% ↓
VGGNet-19-BN	原图	92.70%	4.85%	67.76%
	对抗图像（本文方法）	92.50%	67.19% ↑	38.52% ↓
ResNet-50	原图	94.50%	2.97%	63.99%
	对抗图像（本文方法）	94.40%	33.73% ↑	47.32% ↓
DenseNet-161	原图	96.60%	4.00%	65.37%
	对抗图像（本文方法）	96.20%	38.51% ↑	45.41% ↓

类别	单张图像的对抗补丁		可泛化的通用对抗补丁
类别	ER_p	ER_b	ER_p	ER_b
airliner	67.32%	32.24%	68.56% ↑	30.26% ↓
sports_car	63.56%	35.61%	65.32% ↑	34.12% ↓
indigo_bunting	68.39%	14.99%	70.45% ↑	13.79% ↓
tabby	69.51%	37.26%	71.23% ↑	36.56% ↓
hartebeest	50.07%	13.28%	51.89% ↑	12.76% ↓
golden_retriever	62.17%	26.51%	63.78% ↑	25.78% ↓
bullfrog	59.34%	19.30%	60.45% ↑	18.32% ↓
sorrel	65.86%	32.19%	67.49% ↑	31.37% ↓
speedboat	63.39%	26.27%	65.02% ↑	24.53% ↓
pickup	67.85%	37.16%	69.30% ↑	36.52% ↓

模型	方法		左上角			右下角			四周
模型	方法	top1准确率	ER_p	ER_b	top1准确率	ER_p	ER_b	top1准确率	ER_p	ER_b
VGGNet-16	原图	90.60%	5.67%	62.05%	90.60%	6.23%	62.05%	90.60%	9.38%	62.05%
	对抗图像（本文方法）	90.60%	95.75% ↑	29.26% ↓	90.40%	95.76% ↑	35.68% ↓	90.60%	96.32% ↑	40.21% ↓
VGGNet19-BN	原图	92.70%	4.85%	67.76%	92.70%	5.39%	67.76%	92.70%	9.24%	67.76%
	对抗图像（本文方法）	92.70%	97.08% ↑	26.62% ↓	92.70%	96.74% ↑	33.44% ↓	92.70%	96.13% ↑	39.25% ↓

基于对抗补丁的可泛化的Grad-CAM攻击方法

Generalized Grad-CAM attacking method based on adversarial patch

在线阅读

PDF下载

可视化

摘要/Abstract

引用本文

使用本文

图/表 11

参考文献 25

相关文章 15

Metrics

推荐阅读 0

[1]	李昂, 陈建新, 魏昕, 周亮. 面向6G的跨模态信号重建技术[J]. 通信学报, 2022, 43(6): 28-40.
[2]	王晓丹, 李京泰, 宋亚飞. DDAC：面向卷积神经网络图像隐写分析模型的特征提取方法[J]. 通信学报, 2022, 43(5): 68-81.
[3]	廖育荣, 王海宁, 林存宝, 李阳, 方宇强, 倪淑燕. 基于深度学习的光学遥感图像目标检测研究进展[J]. 通信学报, 2022, 43(5): 190-203.
[4]	张帆, 黄赟, 方子茁, 郭威. 卷积神经网络的损失最小训练后参数量化方法[J]. 通信学报, 2022, 43(4): 114-122.
[5]	朱政宇, 侯庚旺, 黄崇文, 孙钢灿, 郝万明, 梁静. 基于并行CNN的RIS辅助D2D保密通信系统资源分配算法[J]. 通信学报, 2022, 43(3): 172-179.
[6]	安泽亮, 张天骐, 马宝泽, 邓盼, 徐雨晴. 基于一维CNN的多入多出OSTBC信号协作调制识别[J]. 通信学报, 2021, 42(7): 84-94.
[7]	王洪雁, 张莉彬, 陈国强, 汪祖民, 管志远. 结合粒子滤波及度量学习的目标跟踪方法[J]. 通信学报, 2021, 42(5): 98-110.
[8]	高红民, 曹雪莹, 陈忠昊, 花再军, 李臣明, 陈月. 基于多尺度近端特征拼接网络的高光谱图像分类方法[J]. 通信学报, 2021, 42(2): 92-102.
[9]	杨军,党吉圣. 基于上下文注意力CNN的三维点云语义分割[J]. 通信学报, 2020, 41(7): 195-203.
[10]	高红民,曹雪莹,杨耀,花再军,李臣明. 基于CNN的双边融合网络在高光谱图像分类中的应用[J]. 通信学报, 2020, 41(11): 132-140.
[11]	张猛,孙昊良,杨鹏. 基于改进卷积神经网络识别DNS隐蔽信道[J]. 通信学报, 2020, 41(1): 169-179.
[12]	周鑫,何晓新,郑昌文. 基于图像深度学习的无线电信号识别[J]. 通信学报, 2019, 40(7): 114-125.
[13]	查雄,彭华,秦鑫,李广,李天昀. 基于多端卷积神经网络的调制识别方法[J]. 通信学报, 2019, 40(11): 30-37.
[14]	李佳,云晓春,李书豪,张永铮,谢江,方方. 基于混合结构深度神经网络的HTTP恶意流量检测方法[J]. 通信学报, 2019, 40(1): 24-33.
[15]	周胜利,金苍宏,吴礼发,洪征. 基于评分卡—随机森林的云计算用户公共安全信誉模型研究[J]. 通信学报, 2018, 39(5): 143-152.