网络与信息安全学报 ›› 2023, Vol. 9 ›› Issue (6): 154-165.doi: 10.11959/j.issn.2096-109x.2023090

• 学术论文 • 上一篇    

针对可逆神经网络的可视化解释方法

牟新颖1, 宋冰冰2, 李钒效1, 郑奕森1, 周维1, 董云云1,2   

  1. 1 云南大学国家示范性软件学院,云南 昆明 650000
    2 云南大学信息学院,云南 昆明 650000
  • 修回日期:2023-09-09 出版日期:2023-12-01 发布日期:2023-12-01
  • 作者简介:牟新颖(1998- ),女,山东烟台人,云南大学硕士生,主要研究方向为模型可解释性和人工智能安全
    宋冰冰(1994- ),男,四川南充人,云南大学博士生,主要研究方向为人工智能安全和图像隐写
    李钒效(1998- ),男,白族,云南大理人,云南大学硕士生,主要研究方向为人工智能安全和文本隐写
    郑奕森(1999- ),男,云南昆明人,云南大学硕士生,主要研究方向为人工智能安全和自动驾驶安全
    周维(1974- ),男,湖南桃源人,云南大学教授、博士生导师,主要研究方向为网络空间安全(人工智能安全)、分布式云计算
    董云云(1989- ),女,云南保山人,云南大学讲师,主要研究方向为大数据索引、分布式计算、图像隐写
  • 基金资助:
    国家自然科学基金(62162067);国家自然科学基金(62101480);云南省自然科学基金(202005AC160007);云南省自然科学基金(202001BB050076);云南省教育厅基金(2022j0008);云南省重点研发计划;云南省迟学斌专家工作站项目(202305AF150078)

Visual explanation method for reversible neural networks

Xinying MU1, Bingbing SONG2, Fanxiao LI1, Yisen ZHENG1, Wei ZHOU1, Yunyun DONG1,2   

  1. 1 National Pilot School of Software, Yunnan University, Kunming 650000, China
    2 School of Information Science and Engineering, Yunnan University,Kunming 650000,China
  • Revised:2023-09-09 Online:2023-12-01 Published:2023-12-01
  • Supported by:
    TheNational Natural Science Foundation of China(62162067);TheNational Natural Science Foundation of China(62101480);The Natural Science Foundation of Yunnan Province(202005AC160007);The Natural Science Foundation of Yunnan Province(202001BB050076);Yunnan Provincial Department of Education Fund Project(2022j0008);Key R&D Plan of Yunnan Province;Yunnan Province Chi Xuebin Expert Workstation Project(202305AF150078)

摘要:

为了更好地理解深度神经网络(DNN,deep neural network)在应用过程中出现的决策依据未知以及容易受到对抗攻击等安全问题,模型可解释性受到广泛关注。虽然越来越多的学者针对传统深度神经网络的可解释性进行了研究,但对可逆神经网络的运行机制和可解释性的探索还存在不足,且现有针对传统深度神经网络的解释方法不适用于可逆神经网络,存在噪声大、梯度饱和等问题。因此,提出一种针对可逆神经网络的可视化解释方法,其基于类激活映射机制,利用可逆神经网络的可逆特性探索特征图与输入图像之间的区域对应关系,使得区域特征图的分类权重可映射到输入图像的对应区域,得到输入图像每个区域对模型决策的重要程度,从而生成模型决策依据。在通用数据集上,将所提方法与其他解释方法进行实验比较,所提方法取得了更集中的视觉效果,在识别任务中,相较于次优方法平均下降(AD,average drop)指标提升7.80%,平均上升(AI,average increase)指标提升6.05%,热值最大点的定位水平达到82.00%,同时,所提方法可以对传统深度神经网络进行解释且其良好的扩展性可以提高其他方法对可逆神经网络的解释性能。另外,在对抗攻击解析实验中发现,对抗攻击使得模型的决策依据发生改变,体现在模型的关注区域发生错位,这有助于探究对抗攻击的运行机制。

关键词: 模型可解释性, 可逆神经网络, 可视化, 类激活映射, 人工智能安全

Abstract:

The issue of model explainability has gained significant attention in understanding the vulnerabilities and anonymous decision-making processes inherent in deep neural networks (DNN).While there has been considerable research on explainability for traditional DNN, there is a lack of exploration on the operation mechanism and explainability of reversible neural networks (RevNN).Additionally, the existing explanation methods for traditional DNN are not suitable for RevNN and suffer from issues such as excessive noise and gradient saturation.To address these limitations, a visual explanation method called visual explanation method for reversible neural network (VERN) was proposed for RevNN.VERN leverages the reversible property of RevNN and is based on the class-activation mapping mechanism.The correspondence between the feature map and the input image was explored by VERN, allowing for the mapping of classification weights of regional feature maps to the corresponding regions of the input image.The importance of each region for model decision-making was revealed through this process, which generates a basis for model decision-making.Experimental comparisons with other explanation methods on generalized datasets demonstrate that VERN achieves a more focused visual effect, surpassing suboptimal methods with up to 7.80% improvement in average drop (AD) metrics and up to 6.05% improvement in average increase (AI) metrics in recognition tasks.VERN also exhibits an 82.00% level of localization for the maximum point of the heat value.Furthermore, VERN can be applied to explain traditional DNN and exhibits good scalability, improving the performance of other methods in explaining RevNN.Furthermore, through adversarial attack analysis experiments, it is observed that adversarial attacks alter the decision basis of the model.This is reflected in the misalignment of the model’s attention regions, thereby aiding in the exploration of the operation mechanism of adversarial attacks.

Key words: model explainability, reversible neural network, visualization, class activation mapping, artificial intelligence security

中图分类号: 

No Suggested Reading articles found!