针对可逆神经网络的可视化解释方法

doi:10.11959/j.issn.2096-109x.2023090

网络与信息安全学报 ›› 2023, Vol. 9 ›› Issue (6): 154-165.doi: 10.11959/j.issn.2096-109x.2023090

• 学术论文 • 上一篇

针对可逆神经网络的可视化解释方法

牟新颖¹, 宋冰冰², 李钒效¹, 郑奕森¹, 周维¹, 董云云¹^,²

¹ 云南大学国家示范性软件学院，云南昆明 650000
² 云南大学信息学院，云南昆明 650000

修回日期:2023-09-09 出版日期:2023-12-01 发布日期:2023-12-01
作者简介:牟新颖（1998- ），女，山东烟台人，云南大学硕士生，主要研究方向为模型可解释性和人工智能安全
宋冰冰（1994- ），男，四川南充人，云南大学博士生，主要研究方向为人工智能安全和图像隐写
李钒效（1998- ），男，白族，云南大理人，云南大学硕士生，主要研究方向为人工智能安全和文本隐写
郑奕森（1999- ），男，云南昆明人，云南大学硕士生，主要研究方向为人工智能安全和自动驾驶安全
周维（1974- ），男，湖南桃源人，云南大学教授、博士生导师，主要研究方向为网络空间安全（人工智能安全）、分布式云计算
董云云（1989- ），女，云南保山人，云南大学讲师，主要研究方向为大数据索引、分布式计算、图像隐写
基金资助:
国家自然科学基金(62162067);国家自然科学基金(62101480);云南省自然科学基金(202005AC160007);云南省自然科学基金(202001BB050076);云南省教育厅基金(2022j0008);云南省重点研发计划;云南省迟学斌专家工作站项目(202305AF150078)

Visual explanation method for reversible neural networks

Xinying MU¹, Bingbing SONG², Fanxiao LI¹, Yisen ZHENG¹, Wei ZHOU¹, Yunyun DONG¹^,²

¹ National Pilot School of Software, Yunnan University, Kunming 650000, China
² School of Information Science and Engineering, Yunnan University，Kunming 650000,China

Revised:2023-09-09 Online:2023-12-01 Published:2023-12-01
Supported by:
TheNational Natural Science Foundation of China(62162067);TheNational Natural Science Foundation of China(62101480);The Natural Science Foundation of Yunnan Province(202005AC160007);The Natural Science Foundation of Yunnan Province(202001BB050076);Yunnan Provincial Department of Education Fund Project(2022j0008);Key R＆D Plan of Yunnan Province;Yunnan Province Chi Xuebin Expert Workstation Project(202305AF150078)

摘要/Abstract

摘要：

为了更好地理解深度神经网络（DNN，deep neural network）在应用过程中出现的决策依据未知以及容易受到对抗攻击等安全问题，模型可解释性受到广泛关注。虽然越来越多的学者针对传统深度神经网络的可解释性进行了研究，但对可逆神经网络的运行机制和可解释性的探索还存在不足，且现有针对传统深度神经网络的解释方法不适用于可逆神经网络，存在噪声大、梯度饱和等问题。因此，提出一种针对可逆神经网络的可视化解释方法，其基于类激活映射机制，利用可逆神经网络的可逆特性探索特征图与输入图像之间的区域对应关系，使得区域特征图的分类权重可映射到输入图像的对应区域，得到输入图像每个区域对模型决策的重要程度，从而生成模型决策依据。在通用数据集上，将所提方法与其他解释方法进行实验比较，所提方法取得了更集中的视觉效果，在识别任务中，相较于次优方法平均下降（AD，average drop）指标提升7.80%，平均上升（AI，average increase）指标提升6.05%，热值最大点的定位水平达到82.00%，同时，所提方法可以对传统深度神经网络进行解释且其良好的扩展性可以提高其他方法对可逆神经网络的解释性能。另外，在对抗攻击解析实验中发现，对抗攻击使得模型的决策依据发生改变，体现在模型的关注区域发生错位，这有助于探究对抗攻击的运行机制。

关键词: 模型可解释性, 可逆神经网络, 可视化, 类激活映射, 人工智能安全

Abstract:

The issue of model explainability has gained significant attention in understanding the vulnerabilities and anonymous decision-making processes inherent in deep neural networks (DNN).While there has been considerable research on explainability for traditional DNN, there is a lack of exploration on the operation mechanism and explainability of reversible neural networks (RevNN).Additionally, the existing explanation methods for traditional DNN are not suitable for RevNN and suffer from issues such as excessive noise and gradient saturation.To address these limitations, a visual explanation method called visual explanation method for reversible neural network (VERN) was proposed for RevNN.VERN leverages the reversible property of RevNN and is based on the class-activation mapping mechanism.The correspondence between the feature map and the input image was explored by VERN, allowing for the mapping of classification weights of regional feature maps to the corresponding regions of the input image.The importance of each region for model decision-making was revealed through this process, which generates a basis for model decision-making.Experimental comparisons with other explanation methods on generalized datasets demonstrate that VERN achieves a more focused visual effect, surpassing suboptimal methods with up to 7.80% improvement in average drop (AD) metrics and up to 6.05% improvement in average increase (AI) metrics in recognition tasks.VERN also exhibits an 82.00% level of localization for the maximum point of the heat value.Furthermore, VERN can be applied to explain traditional DNN and exhibits good scalability, improving the performance of other methods in explaining RevNN.Furthermore, through adversarial attack analysis experiments, it is observed that adversarial attacks alter the decision basis of the model.This is reflected in the misalignment of the model’s attention regions, thereby aiding in the exploration of the operation mechanism of adversarial attacks.

Key words: model explainability, reversible neural network, visualization, class activation mapping, artificial intelligence security

中图分类号:

TP393

牟新颖, 宋冰冰, 李钒效, 郑奕森, 周维, 董云云. 针对可逆神经网络的可视化解释方法[J]. 网络与信息安全学报, 2023, 9(6): 154-165.

Xinying MU, Bingbing SONG, Fanxiao LI, Yisen ZHENG, Wei ZHOU, Yunyun DONG. Visual explanation method for reversible neural networks[J]. Chinese Journal of Network and Information Security, 2023, 9(6): 154-165.

图/表 23

图1

图2

图3

图4

图5

图6

图7

图8

图9

图10

表1

表2

表3

表4

表5

表6

表7

表8

表9

表10

表11

图11

图12

参考文献 29

[1]	谭清尹, 曾颖明, 韩叶 ,等. 神经网络后门攻击研究[J]. 网络与信息安全学报, 2021,7(3): 46-58.
	TAN Q Y , ZENG Y M , HAN Y ,et al. Survey on backdoor attacks targeted on neural network[J]. Chinese Journal of Network and Information Security, 2021,7(3): 46-58.
[2]	杨朋波, 桑基韬, 张彪 ,等. 面向图像分类的深度模型可解释性研究综述[J]. 软件学报, 2023,34(1): 230-254.
	YANG P B , SANG J T , ZHANG B ,et al. Survey on interpretability of deep models for image classification[J]. Journal of Software, 2023,34(1): 230-254.
[3]	FANG Z , KUANG K , LIN Y ,et al. Concept-based explanation for fine-grained images and its application in infectious keratitis classification[C]// Proceedings of the 28th ACM International Conference on Multimedia. 2020: 700-708.
[4]	化盈盈, 张岱墀, 葛仕明 . 深度学习模型可解释性的研究进展[J]. 信息安全学报, 2020,5(3): 1-12.
	HUA Y Y , ZHANG D X , GE S M ,et al. Research progress in the interpretability of deep learning models[J]. Journal of Cyber Security, 2020,5(3): 1-12.
[5]	纪守领, 李进锋, 杜天宇 ,等. 机器学习模型可解释性方法、应用与安全研究综述[J]. 计算机研究与发展, 2019,56(10): 2071-2096.
	JI S L , LI J F , DU T Y ,et al. Survey on the techniques,applications and security of machine learning interpretablity[J]. Journal of Computer Research and Development, 2019,56(10): 2071-2096.
[6]	ZEILER M D , FERGUS R . Visualizing and understanding convolutional networks[C]// Proceedings of European Conference on Computer Vision–ECCV 2014. 2014: 818-833.
[7]	AGARWAL C , NGUYEN A . Explaining image classifiers by removing input features using generative models[C]// Proceedings of the Asian Conference on Computer Vision. 2020.
[8]	FONG R C , VEDALDI A . Interpretable explanations of black boxes by meaningful perturbation[C]// Proceedings of the IEEE International Conference on Computer Vision. 2017: 3429-3437.
[9]	SELVARAJU R R , COGSWELL M , DAS A ,et al. Grad-CAM:visual explanations from deep networks via gradient-based localization[C]// Proceedings of the IEEE International Conference on Computer Vision. 2017: 618-626.
[10]	CHATTOPADHAY A , SARKAR A , HOWLADER P ,et al. Grad-CAM++:generalized gradient-based visual explanations for deep convolutional networks[C]// 2018 IEEE Winter Conference on Applications of Computer Vision (WACV). 2018: 839-847.
[11]	WANG H , WANG Z , DU M ,et al. Score-CAM:score-weighted visual explanations for convolutional neural networks[C]// Proceedings of the IEEE/CVF Conference on Computer vision and Pattern Recognition Workshops. 2020: 24-25.
[12]	JACOBSEN J H , ARNOLD W M S , EDOUARD O . I-RevNet:deep invertible networks[C]// Proceedings of International Conference on Learning Representations. 2018.
[13]	JING J , DENG X , XU M ,et al. Hinet:deep image hiding by invertible network[C]// Proceedings of the IEEE/CVF International Conference on Computer Vision. 2021: 4733-4742.
[14]	LUGMAYR A , DANELLJAN M , VAN GOOL L ,et al. Srflow:learning the super-resolution space with normalizing flow[C]// Proceedings of Computer Vision–ECCV 2020. 2020: 715-732.
[15]	ZHOU B , KHOSLA A , LAPEDRIZA A ,et al. Learning deep features for discriminative localization[C]// Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 2016: 2921-2929.
[16]	SIMONYAN K , VEDALDI A , ZISSERMAN A . Deep inside convolutional networks:visualising image classification models and saliency maps[J]. arXiv Preprint arXiv:1312.6034, 2013.
[17]	SMILKOV D , THORAT N , KIM B ,et al. Smoothgrad:removing noise by adding noise[J]. arXiv Preprint arXiv:1706.03825, 2017.
[18]	SPRINGENBERG J T , DOSOVITSKIY A , BROX T ,et al. Striving for simplicity:the all convolutional net[J]. arXiv Preprint arXiv:1412.6806, 2014.
[19]	SUNDARARAJAN M , TALY A , YAN Q . Axiomatic attribution for deep networks[C]// International Conference on Machine Learning. 2017: 3319-3328.
[20]	PETSIUK V , DAS A , SAENKO K . Rise:randomized input sampling for explanation of black-box models[J]. arXiv Preprint arXiv:1806.07421, 2018.
[21]	GOMEZ A N , REN M , URTASUN R ,et al. The reversible residual network:backpropagation without storing activations[J]. Advances in Neural Information Processing Systems, 2017,30.
[22]	RAMASWAMY H G . Ablation-cam:visual explanations for deep convolutional network via gradient-free localization[C]// Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision. 2020: 983-991.
[23]	KRIZHEVSKY A , HINTON G . Learning multiple layers of features from tiny images[R]. 2009.
[24]	DENG J , DONG W , SOCHER R ,et al. ImageNet:a large-scale hierarchical image database[C]// Proceedings of 2009 IEEE Conference on Computer Vision and Pattern Recognition. 2009: 248-255.
[25]	MADRY A , MAKELOV A , SCHMIDT L ,et al. Towards deep learning models resistant to adversarial attacks[J]. arXiv Preprint arXiv:1706.06083, 2017.
[26]	DONG Y , LIAO F , PANG T ,et al. Boosting adversarial attacks with momentum[C]// Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 2018: 9185-9193.
[27]	TRAMèR F , KURAKIN A , PAPERNOT N ,et al. Ensemble adversarial training:Attacks and defenses[J]. arXiv Preprint arXiv:1705.07204, 2017.
[28]	KHAKZAR A , KHORSANDI P , NOBAHARI R ,et al. Do explanations explain? Model knows best[C]// Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 2022: 10244-10253.
[29]	DABKOWSKI P , GAL Y . Real time image saliency for black box classifiers[C]// Advances in Neural Information Processing Systems. 2017,30.

解释方法	生成一幅解释图所用时间/ms
解释方法	i-RevNet	VGG16	AlexNet	ResNet18
Grad-CAM	58.4	19.7	7.9	18.3
Grad-CAM++	61.9	23.8	9.3	18.8
Score-CAM	—	1758.5	301.7	1543.0
VERN	37.3	223.7	154.0	29.7

解释方法	AD指标	AI指标
Gradient	72.59%	5.40%
Occlusion	85.48%	4.83%
Mask	67.11%	6.0%3
RISE	31.74%	24.00%
Grad-CAM	26.79%	23.77%
Grad-CAM++	27.52%	21.93%
Score-CAM	89.61%	2.70%
VERN	26.57%	24.23%

解释方法	AD指标	AI指标
Gradient	74.56%	3.37%
Occlusion	50.80%	10.13%
Mask	57.05%	8.17%
RISE	46.15%	10.87%
Grad-CAM	30.85%	11.40%
Grad-CAM++	34.27%	10.10%
Score-CAM	44.14%	0.80%
VERN	23.4%	14.97%

解释方法	VGG16		AlexNet		ResNet18
解释方法	AD指标	AI指标	AD指标	AI指标	AD指标	AI指标
Gradient	76.69%	4.65%	79.62%	4.10%	75.98%	4.10%
Occlusion	94.74%	0.60%	89.16%	2.60%	94.81%	0.90%
Mask	52.75%	8.95%	67.35%	5.30%	57.75%	8.30%
RISE	34.60%	17.60%	52.94%	14.60%	34.55%	20.10%
Grad-CAM	34.04%	19.70%	60.69%	10.35%	26.30%	21.40%
Grad-CAM++	33.44%	16.70%	66.88%	6.90%	27.57%	19.7%
Score-CAM	25.59%	19.70%	54.57%	10.40%	28.44%	18.85%
VERN	18.16%	25.75%	45.14%	15.50%	26.11%	21.70%

数据集		Deletion指标			Insertion指标
数据集	Grad-CAM	Grad-CAM++	VERN	Grad-CAM	Grad-CAM++	VERN
ImageNet	0.159	0.163	0.159	0.557	0.554	0.560
CIFAR-10	0.346	0.365	0.333	0.610	0.594	0.644

针对可逆神经网络的可视化解释方法

Visual explanation method for reversible neural networks

在线阅读

pdf下载

可视化

摘要/Abstract

引用本文

使用本文

图/表 23

参考文献 29

相关文章 9

Metrics

推荐阅读 0

传统深度神经网络待解释模型	Deletion指标				Insertion指标
传统深度神经网络待解释模型	Grad-CAM	Grad-CAM++	Score-CAM	VERN	Grad-CAM	Grad-CAM++	Score-CAM	VERN
VGG16	0.120	0.121	0.118	0.157	0.540	0.540	0.581	0.622
AlexNet	0.101	0.118	0.101	0.114	0.345	0.311	0.360	0.408
ResNet18	0.134	0.137	0.139	0.134	0.549	0.540	0.539	0.549

解释方法	AD指标	Improved AD指标	AI指标	Improved AI指标
Gradient	72.59%	53.15%	5.4%	13.2%
Occlusion	85.48%	60.33%	4.83%	25.83%
Mask	67.11%	43.87%	6.03%	10.13%
RISE	31.74%	24.39%	24%	28.27%
Grad-CAM	26.79%	26.11%	23.77%	24.43%
Grad-CAM++	27.52%	26.56%	21.93%	23.23%
Score-CAM	89.61%	88.72%	2.7%	3.97%

解释方法	落入比例
Grad-CAM	79.2 %
Grad-CAM++	77.8 %
VERN	82.00%

模型	解释方法	AD指标	AI指标
	Grad-CAM	27.22%	12.20%
VGG16	Grad-CAM++	27.23%	11.40%
	VERN	26.39%	12.20%
	Grad-CAM	52.69%	10.40%
AlexNet	Grad-CAM++	52.50%	10.20%
	VERN	52.00%	11.20%
	Grad-CAM	25.98%	15.40%
ResNet18	Grad-CAM++	26.04%	16.00%
	VERN	25.00%	17.40%

[1]	王金伟, 陈正嘉, 谢雪, 罗向阳, 马宾. 恶意软件检测和分类可视化技术综述[J]. 网络与信息安全学报, 2023, 9(5): 1-20.
[2]	何毅凡, 张杰, 张卫明, 俞能海. 可逆神经网络的隐私泄露风险评估[J]. 网络与信息安全学报, 2023, 9(4): 29-39.
[3]	沈晓晨, 葛寅辉, 陈波, 于泠. 人工智能安全知识图谱构建技术研究[J]. 网络与信息安全学报, 2023, 9(2): 164-174.
[4]	谭清尹, 曾颖明, 韩叶, 刘一静, 刘哲理. 神经网络后门攻击研究[J]. 网络与信息安全学报, 2021, 7(3): 46-58.
[5]	赵颖,张卓,袁晓如. 数据可视分析挑战赛三年回顾[J]. 网络与信息安全学报, 2018, 4(2): 55-61.
[6]	张浩城,吴晓洁,唐翔,舒润萱,丁天琛,董笑菊. 基于可视分析的网络异常检测系统[J]. 网络与信息安全学报, 2018, 4(2): 40-54.
[7]	马国峻,王水波,裴庆祺,詹阳. 基于主成分分析和K-means聚类的平行坐标可视化技术研究[J]. 网络与信息安全学报, 2017, 3(8): 18-27.
[8]	张毅凡,董笑菊. 分布式拒绝服务的可视分析[J]. 网络与信息安全学报, 2017, 3(2): 53-65.
[9]	李伟明,邹德清,孙国忠. 针对恶意代码的连续内存镜像分析方法[J]. 网络与信息安全学报, 2017, 3(2): 20-30.

解释方法	IoU
Grad-CAM	0.277
Grad-CAM++	0.293
VERN	0.438