网络与信息安全学报 ›› 2022, Vol. 8 ›› Issue (2): 88-99.doi: 10.11959/j.issn.2096-109x.2022012

• 专栏:网络攻击与防御技术 • 上一篇    下一篇

基于多维特征图知识蒸馏的对抗样本防御方法

邱宝琳, 易平   

  1. 上海交通大学网络空间安全学院,上海 200240
  • 修回日期:2022-01-28 出版日期:2022-04-15 发布日期:2022-04-01
  • 作者简介:邱宝琳(1995− ),男,山东潍坊人,上海交通大学硕士生,主要研究方向为人工智能安全和对抗样本
    易平(1969− ),男,河南洛阳人,博士,上海交通大学副教授,主要研究方向为人工智能安全和网络对抗
  • 基金资助:
    国家重点研发计划(2019YFB1405000)

Adversarial examples defense method based on multi-dimensional feature maps knowledge distillation

Baolin QIU, Ping YI   

  1. School of Cyber Science and Engineering, Shanghai Jiao Tong University, Shanghai 200240, China
  • Revised:2022-01-28 Online:2022-04-15 Published:2022-04-01
  • Supported by:
    The National Key R&D Program of China(2019YFB1405000)

摘要:

计算机视觉领域倾向使用深度神经网络完成识别任务,但对抗样本会导致网络决策异常。为了防御对抗样本,主流的方法是对模型进行对抗训练。对抗训练存在算力高、训练耗时长的缺点,其应用场景受限。提出一种基于知识蒸馏的对抗样本防御方法,将大型数据集学习到的防御经验复用到新的分类任务中。在蒸馏过程中,教师模型和学生模型结构一致,利用模型特征图向量作为媒介进行经验传递,并只使用干净样本训练。使用多维度特征图强化语义信息的表达,并且提出一种基于特征图的注意力机制,将特征依据重要程度赋予权重,增强蒸馏效果。所提算法在 Cifar100、Cifar10 等开源数据集上进行实验,使用FGSM(fast gradient sign method)、PGD(project gradient descent)、C&W(Carlini-Wagner attack)等算法进行白盒攻击,测试实验效果。所提方法在 Cifar10 干净样本的准确率超过对抗训练,接近模型在干净样本正常训练的准确率。在L2距离的PGD攻击下,所提方法效果接近对抗训练,显著高于正常训练。而且其学习成本小,即使添加注意力机制和多维度特征图等优化方案,算力需求依然远小于对抗训练,是一种轻量级的对抗防御方法。知识蒸馏作为一种神经网络学习方案,不仅能够学习正常样本的决策经验,而且可以提取鲁棒性特征。使用少量数据生成准确而鲁棒的模型,可以提高泛化能力,减少对抗训练带来的成本。

关键词: 深度学习, 对抗样本防御, 知识蒸馏, 多维度特征图

Abstract:

The neural network approach has been commonly used in computer vision tasks.However, adversarial examples are able to make a neural network generate a false prediction.Adversarial training has been shown to be an effective approach to defend against the impact of adversarial examples.Nevertheless, it requires high computing power and long training time thus limiting its application scenarios.An adversarial examples defense method based on knowledge distillation was proposed, reusing the defense experience from the large datasets to new classification tasks.During distillation, teacher model has the same structure as student model and the feature map vector was used to transfer experience, and clean samples were used for training.Multi-dimensional feature maps were utilized to enhance the semantic information.Furthermore, an attention mechanism based on feature map was proposed, which boosted the effect of distillation by assigning weights to features according to their importance.Experiments were conducted over cifar100 and cifar10 open-source dataset.And various white-box attack algorithms such as FGSM (fast gradient sign method), PGD (project gradient descent) and C&W (Carlini-Wagner attack) were applied to test the experimental results.The accuracy of the proposed method on Cifar10 clean samples exceeds that of adversarial training and is close to the accuracy of the model trained on clean samples.Under the PGD attack of L2 distance, the efficiency of the proposed method is close to that of adversarial training, which is significantly higher than that of normal training.Moreover, the proposed method is a light-weight adversarial defense method with low learning cost.The computing power requirement is far less than that of adversarial training even if optimization schemes such as attention mechanism and multi-dimensional feature map are added.Knowledge distillation can learn the decision-making experience of normal samples and extract robust features as a neural network learning scheme.It uses a small amount of data to generate accurate and robust models, improves generalization, and reduces the cost of adversarial training.

Key words: deep learning, adversarial examples defense, knowledge distillation, multi-dimensional feature maps

中图分类号: 

No Suggested Reading articles found!