网络与信息安全学报 ›› 2023, Vol. 9 ›› Issue (6): 154-165.doi: 10.11959/j.issn.2096-109x.2023090
• 学术论文 • 上一篇
牟新颖1, 宋冰冰2, 李钒效1, 郑奕森1, 周维1, 董云云1,2
修回日期:
2023-09-09
出版日期:
2023-12-01
发布日期:
2023-12-01
作者简介:
牟新颖(1998- ),女,山东烟台人,云南大学硕士生,主要研究方向为模型可解释性和人工智能安全基金资助:
Xinying MU1, Bingbing SONG2, Fanxiao LI1, Yisen ZHENG1, Wei ZHOU1, Yunyun DONG1,2
Revised:
2023-09-09
Online:
2023-12-01
Published:
2023-12-01
Supported by:
摘要:
为了更好地理解深度神经网络(DNN,deep neural network)在应用过程中出现的决策依据未知以及容易受到对抗攻击等安全问题,模型可解释性受到广泛关注。虽然越来越多的学者针对传统深度神经网络的可解释性进行了研究,但对可逆神经网络的运行机制和可解释性的探索还存在不足,且现有针对传统深度神经网络的解释方法不适用于可逆神经网络,存在噪声大、梯度饱和等问题。因此,提出一种针对可逆神经网络的可视化解释方法,其基于类激活映射机制,利用可逆神经网络的可逆特性探索特征图与输入图像之间的区域对应关系,使得区域特征图的分类权重可映射到输入图像的对应区域,得到输入图像每个区域对模型决策的重要程度,从而生成模型决策依据。在通用数据集上,将所提方法与其他解释方法进行实验比较,所提方法取得了更集中的视觉效果,在识别任务中,相较于次优方法平均下降(AD,average drop)指标提升7.80%,平均上升(AI,average increase)指标提升6.05%,热值最大点的定位水平达到82.00%,同时,所提方法可以对传统深度神经网络进行解释且其良好的扩展性可以提高其他方法对可逆神经网络的解释性能。另外,在对抗攻击解析实验中发现,对抗攻击使得模型的决策依据发生改变,体现在模型的关注区域发生错位,这有助于探究对抗攻击的运行机制。
中图分类号:
牟新颖, 宋冰冰, 李钒效, 郑奕森, 周维, 董云云. 针对可逆神经网络的可视化解释方法[J]. 网络与信息安全学报, 2023, 9(6): 154-165.
Xinying MU, Bingbing SONG, Fanxiao LI, Yisen ZHENG, Wei ZHOU, Yunyun DONG. Visual explanation method for reversible neural networks[J]. Chinese Journal of Network and Information Security, 2023, 9(6): 154-165.
表4
传统深度神经网络在ImageNet上的AI和AD指标评估Table 4 Indicator evaluation average drop and average increase for traditional DNN on ImageNet"
解释方法 | VGG16 | AlexNet | ResNet18 | |||||
AD指标 | AI指标 | AD指标 | AI指标 | AD指标 | AI指标 | |||
Gradient | 76.69% | 4.65% | 79.62% | 4.10% | 75.98% | 4.10% | ||
Occlusion | 94.74% | 0.60% | 89.16% | 2.60% | 94.81% | 0.90% | ||
Mask | 52.75% | 8.95% | 67.35% | 5.30% | 57.75% | 8.30% | ||
RISE | 34.60% | 17.60% | 52.94% | 14.60% | 34.55% | 20.10% | ||
Grad-CAM | 34.04% | 19.70% | 60.69% | 10.35% | 26.30% | 21.40% | ||
Grad-CAM++ | 33.44% | 16.70% | 66.88% | 6.90% | 27.57% | 19.7% | ||
Score-CAM | 25.59% | 19.70% | 54.57% | 10.40% | 28.44% | 18.85% | ||
VERN | 18.16% | 25.75% | 45.14% | 15.50% | 26.11% | 21.70% |
表6
传统深度神经网络的Deletion和Insertion指标评估Table 6 Deletion and Insertion indicator evaluation of traditional DNN"
传统深度神经网络待解释模型 | Deletion指标 | Insertion指标 | |||||||
Grad-CAM | Grad-CAM++ | Score-CAM | VERN | Grad-CAM | Grad-CAM++ | Score-CAM | VERN | ||
VGG16 | 0.120 | 0.121 | 0.118 | 0.157 | 0.540 | 0.540 | 0.581 | 0.622 | |
AlexNet | 0.101 | 0.118 | 0.101 | 0.114 | 0.345 | 0.311 | 0.360 | 0.408 | |
ResNet18 | 0.134 | 0.137 | 0.139 | 0.134 | 0.549 | 0.540 | 0.539 | 0.549 |
表7
基于VERN在ImageNet上提高其他方法的忠实度的评价指标Table 7 Evaluation indicator of improving the faithfulness of other methods based on VERN on ImageNet"
解释方法 | AD指标 | Improved AD指标 | AI指标 | Improved AI指标 |
Gradient | 72.59% | 53.15% | 5.4% | 13.2% |
Occlusion | 85.48% | 60.33% | 4.83% | 25.83% |
Mask | 67.11% | 43.87% | 6.03% | 10.13% |
RISE | 31.74% | 24.39% | 24% | 28.27% |
Grad-CAM | 26.79% | 26.11% | 23.77% | 24.43% |
Grad-CAM++ | 27.52% | 26.56% | 21.93% | 23.23% |
Score-CAM | 89.61% | 88.72% | 2.7% | 3.97% |
表8
基于VERN在CIFAR-10上提高其他方法的忠实度的评价指标Table 8 Evaluation indicator of improving the faithfulness of other methods based on VERN on CIFAR-10"
解释方法 | AD指标 | Improved AD指标 | AI指标 | Improved AI指标 |
Gradient | 74.56% | 41.91% | 3.37% | 7.50% |
Occlusion | 50.80% | 35.6% | 10.13% | 16.63% |
Mask | 57.05% | 54.79% | 8.17% | 9.60% |
RISE | 46.15% | 44.91% | 10.87% | 11.00% |
Grad-CAM | 30.85% | 30.87% | 11.40% | 11.33% |
Grad-CAM++ | 34.27% | 33.63% | 10.10% | 10.50% |
Score-CAM | 44.14% | 44.67% | 0.80% | 0.80% |
表11
泛化性证明—— AD和AI Table 11 Generalization proof—— AD and AI"
模型 | 解释方法 | AD指标 | AI指标 |
Grad-CAM | 27.22% | 12.20% | |
VGG16 | Grad-CAM++ | 27.23% | 11.40% |
VERN | 26.39% | 12.20% | |
Grad-CAM | 52.69% | 10.40% | |
AlexNet | Grad-CAM++ | 52.50% | 10.20% |
VERN | 52.00% | 11.20% | |
Grad-CAM | 25.98% | 15.40% | |
ResNet18 | Grad-CAM++ | 26.04% | 16.00% |
VERN | 25.00% | 17.40% |
[1] | 谭清尹, 曾颖明, 韩叶 ,等. 神经网络后门攻击研究[J]. 网络与信息安全学报, 2021,7(3): 46-58. |
TAN Q Y , ZENG Y M , HAN Y ,et al. Survey on backdoor attacks targeted on neural network[J]. Chinese Journal of Network and Information Security, 2021,7(3): 46-58. | |
[2] | 杨朋波, 桑基韬, 张彪 ,等. 面向图像分类的深度模型可解释性研究综述[J]. 软件学报, 2023,34(1): 230-254. |
YANG P B , SANG J T , ZHANG B ,et al. Survey on interpretability of deep models for image classification[J]. Journal of Software, 2023,34(1): 230-254. | |
[3] | FANG Z , KUANG K , LIN Y ,et al. Concept-based explanation for fine-grained images and its application in infectious keratitis classification[C]// Proceedings of the 28th ACM International Conference on Multimedia. 2020: 700-708. |
[4] | 化盈盈, 张岱墀, 葛仕明 . 深度学习模型可解释性的研究进展[J]. 信息安全学报, 2020,5(3): 1-12. |
HUA Y Y , ZHANG D X , GE S M ,et al. Research progress in the interpretability of deep learning models[J]. Journal of Cyber Security, 2020,5(3): 1-12. | |
[5] | 纪守领, 李进锋, 杜天宇 ,等. 机器学习模型可解释性方法、应用与安全研究综述[J]. 计算机研究与发展, 2019,56(10): 2071-2096. |
JI S L , LI J F , DU T Y ,et al. Survey on the techniques,applications and security of machine learning interpretablity[J]. Journal of Computer Research and Development, 2019,56(10): 2071-2096. | |
[6] | ZEILER M D , FERGUS R . Visualizing and understanding convolutional networks[C]// Proceedings of European Conference on Computer Vision–ECCV 2014. 2014: 818-833. |
[7] | AGARWAL C , NGUYEN A . Explaining image classifiers by removing input features using generative models[C]// Proceedings of the Asian Conference on Computer Vision. 2020. |
[8] | FONG R C , VEDALDI A . Interpretable explanations of black boxes by meaningful perturbation[C]// Proceedings of the IEEE International Conference on Computer Vision. 2017: 3429-3437. |
[9] | SELVARAJU R R , COGSWELL M , DAS A ,et al. Grad-CAM:visual explanations from deep networks via gradient-based localization[C]// Proceedings of the IEEE International Conference on Computer Vision. 2017: 618-626. |
[10] | CHATTOPADHAY A , SARKAR A , HOWLADER P ,et al. Grad-CAM++:generalized gradient-based visual explanations for deep convolutional networks[C]// 2018 IEEE Winter Conference on Applications of Computer Vision (WACV). 2018: 839-847. |
[11] | WANG H , WANG Z , DU M ,et al. Score-CAM:score-weighted visual explanations for convolutional neural networks[C]// Proceedings of the IEEE/CVF Conference on Computer vision and Pattern Recognition Workshops. 2020: 24-25. |
[12] | JACOBSEN J H , ARNOLD W M S , EDOUARD O . I-RevNet:deep invertible networks[C]// Proceedings of International Conference on Learning Representations. 2018. |
[13] | JING J , DENG X , XU M ,et al. Hinet:deep image hiding by invertible network[C]// Proceedings of the IEEE/CVF International Conference on Computer Vision. 2021: 4733-4742. |
[14] | LUGMAYR A , DANELLJAN M , VAN GOOL L ,et al. Srflow:learning the super-resolution space with normalizing flow[C]// Proceedings of Computer Vision–ECCV 2020. 2020: 715-732. |
[15] | ZHOU B , KHOSLA A , LAPEDRIZA A ,et al. Learning deep features for discriminative localization[C]// Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 2016: 2921-2929. |
[16] | SIMONYAN K , VEDALDI A , ZISSERMAN A . Deep inside convolutional networks:visualising image classification models and saliency maps[J]. arXiv Preprint arXiv:1312.6034, 2013. |
[17] | SMILKOV D , THORAT N , KIM B ,et al. Smoothgrad:removing noise by adding noise[J]. arXiv Preprint arXiv:1706.03825, 2017. |
[18] | SPRINGENBERG J T , DOSOVITSKIY A , BROX T ,et al. Striving for simplicity:the all convolutional net[J]. arXiv Preprint arXiv:1412.6806, 2014. |
[19] | SUNDARARAJAN M , TALY A , YAN Q . Axiomatic attribution for deep networks[C]// International Conference on Machine Learning. 2017: 3319-3328. |
[20] | PETSIUK V , DAS A , SAENKO K . Rise:randomized input sampling for explanation of black-box models[J]. arXiv Preprint arXiv:1806.07421, 2018. |
[21] | GOMEZ A N , REN M , URTASUN R ,et al. The reversible residual network:backpropagation without storing activations[J]. Advances in Neural Information Processing Systems, 2017,30. |
[22] | RAMASWAMY H G . Ablation-cam:visual explanations for deep convolutional network via gradient-free localization[C]// Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision. 2020: 983-991. |
[23] | KRIZHEVSKY A , HINTON G . Learning multiple layers of features from tiny images[R]. 2009. |
[24] | DENG J , DONG W , SOCHER R ,et al. ImageNet:a large-scale hierarchical image database[C]// Proceedings of 2009 IEEE Conference on Computer Vision and Pattern Recognition. 2009: 248-255. |
[25] | MADRY A , MAKELOV A , SCHMIDT L ,et al. Towards deep learning models resistant to adversarial attacks[J]. arXiv Preprint arXiv:1706.06083, 2017. |
[26] | DONG Y , LIAO F , PANG T ,et al. Boosting adversarial attacks with momentum[C]// Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 2018: 9185-9193. |
[27] | TRAMèR F , KURAKIN A , PAPERNOT N ,et al. Ensemble adversarial training:Attacks and defenses[J]. arXiv Preprint arXiv:1705.07204, 2017. |
[28] | KHAKZAR A , KHORSANDI P , NOBAHARI R ,et al. Do explanations explain? Model knows best[C]// Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 2022: 10244-10253. |
[29] | DABKOWSKI P , GAL Y . Real time image saliency for black box classifiers[C]// Advances in Neural Information Processing Systems. 2017,30. |
[1] | 王金伟, 陈正嘉, 谢雪, 罗向阳, 马宾. 恶意软件检测和分类可视化技术综述[J]. 网络与信息安全学报, 2023, 9(5): 1-20. |
[2] | 何毅凡, 张杰, 张卫明, 俞能海. 可逆神经网络的隐私泄露风险评估[J]. 网络与信息安全学报, 2023, 9(4): 29-39. |
[3] | 沈晓晨, 葛寅辉, 陈波, 于泠. 人工智能安全知识图谱构建技术研究[J]. 网络与信息安全学报, 2023, 9(2): 164-174. |
[4] | 谭清尹, 曾颖明, 韩叶, 刘一静, 刘哲理. 神经网络后门攻击研究[J]. 网络与信息安全学报, 2021, 7(3): 46-58. |
[5] | 赵颖,张卓,袁晓如. 数据可视分析挑战赛三年回顾[J]. 网络与信息安全学报, 2018, 4(2): 55-61. |
[6] | 张浩城,吴晓洁,唐翔,舒润萱,丁天琛,董笑菊. 基于可视分析的网络异常检测系统[J]. 网络与信息安全学报, 2018, 4(2): 40-54. |
[7] | 马国峻,王水波,裴庆祺,詹阳. 基于主成分分析和K-means聚类的平行坐标可视化技术研究[J]. 网络与信息安全学报, 2017, 3(8): 18-27. |
[8] | 张毅凡,董笑菊. 分布式拒绝服务的可视分析[J]. 网络与信息安全学报, 2017, 3(2): 53-65. |
[9] | 李伟明,邹德清,孙国忠. 针对恶意代码的连续内存镜像分析方法[J]. 网络与信息安全学报, 2017, 3(2): 20-30. |
阅读次数 | ||||||
全文 |
|
|||||
摘要 |
|
|||||
|