基于多维特征图知识蒸馏的对抗样本防御方法

doi:10.11959/j.issn.2096-109x.2022012

Abstract

Abstract:

The neural network approach has been commonly used in computer vision tasks.However, adversarial examples are able to make a neural network generate a false prediction.Adversarial training has been shown to be an effective approach to defend against the impact of adversarial examples.Nevertheless, it requires high computing power and long training time thus limiting its application scenarios.An adversarial examples defense method based on knowledge distillation was proposed, reusing the defense experience from the large datasets to new classification tasks.During distillation, teacher model has the same structure as student model and the feature map vector was used to transfer experience, and clean samples were used for training.Multi-dimensional feature maps were utilized to enhance the semantic information.Furthermore, an attention mechanism based on feature map was proposed, which boosted the effect of distillation by assigning weights to features according to their importance.Experiments were conducted over cifar100 and cifar10 open-source dataset.And various white-box attack algorithms such as FGSM (fast gradient sign method), PGD (project gradient descent) and C＆amp;W (Carlini-Wagner attack) were applied to test the experimental results.The accuracy of the proposed method on Cifar10 clean samples exceeds that of adversarial training and is close to the accuracy of the model trained on clean samples.Under the PGD attack of L2 distance, the efficiency of the proposed method is close to that of adversarial training, which is significantly higher than that of normal training.Moreover, the proposed method is a light-weight adversarial defense method with low learning cost.The computing power requirement is far less than that of adversarial training even if optimization schemes such as attention mechanism and multi-dimensional feature map are added.Knowledge distillation can learn the decision-making experience of normal samples and extract robust features as a neural network learning scheme.It uses a small amount of data to generate accurate and robust models, improves generalization, and reduces the cost of adversarial training.

Key words: deep learning, adversarial examples defense, knowledge distillation, multi-dimensional feature maps

CLC Number:

TP309

Baolin QIU, Ping YI. Adversarial examples defense method based on multi-dimensional feature maps knowledge distillation[J]. Chinese Journal of Network and Information Security, 2022, 8(2): 88-99.

Figures/Tables 15

训练方式	数据集	干净样本	FGSM
正常训练	$C i f a r 10$	$94 . 77 %$	$7 . 1 %$
对抗训练	$C i f a r 10$	$87 . 42 %$	$58 . 04 %$
Shafahi蒸馏	Cifar100→Cifar10	92.93%	46.13%
特征图蒸馏	Cifar100→Cifar10	85.67%	40.02%
频域蒸馏	Cifar100→Cifar10	87.81%	40.56%
Shafahi+mask	Cifar100→Cifar10	94.79%	61.30%
特征图+mask	$C i f a r 100 \to C i f a r 10$	$94 . 17 %$	$62 . 44 %$
频域+mask	$C i f a r 100 \to C i f a r 10$	$94 . 23 %$	$63 . 57 %$

训练方式	数据集	PGD-5	PGD-10	PGD-20
正常训练	$C i f a r 10$	$0 . 01 %$	$0$	$0$
对抗训练	$C i f a r 10$	$45 . 57 %$	$31 . 06 %$	$24 . 95 %$
Shafahi蒸馏	Cifar100→Cifar10	13.15%	4.7%	2.35%
特征图蒸馏	Cifar100→Cifar10	33.27%	18.26%	12.91%
频域蒸馏	Cifar100→Cifar10	27.27%	13.07%	8.39%
Shafahi+mask	Cifar100→Cifar10	29.44%	17.71%	13.32%
特征图+mask	$C i f a r 100 \to C i f a r 10$	$37 . 41 %$	$28 . 63 %$	$23 . 49 %$
频域+mask	$C i f a r 100 \to C i f a r 10$	$39 . 94 %$	$30 . 44 %$	$25 . 14 %$

训练方式	数据集	PGD-5	PGD-10	PGD-20
正常训练	$C i f a r 10$	$0 . 01 %$	$0$	$0$
对抗训练	$C i f a r 10$	$57 . 76 %$	$49 . 31 %$	$47 . 38 %$
Shafahi蒸馏	Cifar100→Cifar10	11.63%	4.78%	2.61%
特征图蒸馏	Cifar100→Cifar10	35.95%	26.15%	23.93%
频域蒸馏	Cifar100→Cifar10	30.49%	20.26%	17.46%
Shafahi+mask	Cifar100→Cifar10	25.95%	15.89%	12.25%
特征图+mask	$C i f a r 100 \to C i f a r 10$	$34 . 56 %$	$26 . 17 %$	$21 . 87 %$
频域+mask	$C i f a r 100 \to C i f a r 10$	$37 . 09 %$	$27 . 82 %$	$23 . 33 %$

训练方式	数据集	CW-5	CW-10	CW-20
正常训练	$C i f a r 10$	$0 . 015 %$	$0$	$0$
对抗训练	$C i f a r 10$	$53 . 84 %$	$46 . 54 %$	$45 . 27 %$
Shafahi蒸馏	Cifar100→Cifar10	8.71%	3.27%	1.77%
特征图蒸馏	Cifar100→Cifar10	32.85%	24%	22.46%
频域蒸馏	Cifar100→Cifar10	27.35%	17.98%	16.15%
Shafahi+mask	Cifar100→Cifar10	20.93%	13.19%	9.83%
特征图+mask	$C i f a r 100 \to C i f a r 10$	$28 . 44 %$	$20 . 39 %$	$16 . 64 %$
频域+mask	$C i f a r 100 \to C i f a r 10$	$32 . 17 %$	$23 . 44 %$	$19 . 66 %$

训练方式	数据集	耗时
正常训练	$C i f a r 10$	$19 m i n$
对抗训练	$C i f a r 10$	$5 . 5 h$
Shafahi蒸馏	Cifar100→Cifar10	24min
特征图蒸馏	Cifar100→Cifar10	29min
频域蒸馏	Cifar100→Cifar10	27min
Shafahi+mask	Cifar100→Cifar10	29min
特征图+mask	$C i f a r 100 \to C i f a r 10$	$35 m i n$
频域+mask	$C i f a r 100 \to C i f a r 10$	$33 m i n$

权重α	Clean	FGSM	PGD-5	PGD-10	PGD-20
5×10^-1	94.65%	28.31%	0.55%	0.02%	0
5	94.80%	38.39%	2.15%	0.17%	0.05%
50×10¹	94.95%	52.78%	9.21%	2.61%	1.00%
5×10²	$94 . 79 %$	$61 . 3 %$	$25 . 95 %$	$15 . 89 %$	$12 . 25 %$
5×10³	93.99%	58.28%	27.41%	17.54%	13.22%

权重α	Clean	FGSM	PGD-5	PGD-10	PGD-20
5×10^-1	94.63%	33.29%	0.89%	0.05%	0.01%
5	95.12%	45.31%	3.38%	0.44%	0.09%
5×10¹	94.79%	58.92%	15.71%	6.21%	3.37%
$5 \times 1 0^{2}$	$94 . 17 %$	$62 . 44 %$	$34 . 56 %$	$26 . 17 %$	$21 . 87 %$
5×10³	91.67%	54.38%	29.78%	19.03%	13.32%

权重α	Clean	FGSM	PGD-5	PGD-10	PGD-20
0.01	94.75%	43.68%	1.73%	0.12%	0.02%
0.1	95.03%	51.42%	7.87%	1.78%	0.43%
1	95.05%	62.78%	27.53%	17.40%	13.36%
$10$	$94 . 23 %$	$63 . 57 %$	$37 . 09 %$	$27 . 82 %$	$23 . 33 %$
20	93.20%	57.47%	33.34%	28.40%	25.92%
30	87.89%	42.49%	19.55%	15.11%	11.96%

References 29

[1]	SZEGEDY C , ZAREMBA W , SUTSKEVER I ,et al. Intriguing properties of neural networks[J]. arXiv preprint arXiv:1312.6199, 2013.
[2]	XU K D , ZHANG G Y , LIU S J ,et al. Adversarial T-shirt evading person detectors in a physical world[C]// Proceedings of Computer Vision - ECCV 2020. 2020: 665-681.
[3]	HE K M , ZHANG X Y , REN S Q ,et al. Deep residual learning for image recognition[C]// Proceedings of 2016 IEEE Conference on Computer Vision and Pattern Recognition. 2016: 770-778.
[4]	SZEGEDY C , VANHOUCKE V , IOFFE S ,et al. Rethinking the inception architecture for computer vision[C]// Proceedings of 2016 IEEE Conference on Computer Vision and Pattern Recognition. 2016: 2818-2826.
[5]	HINTON G E , VINYALS O , DEAN J . Distilling the knowledge in a neural network[J]. arXiv preprint arXiv:1503.02531, 2015.
[6]	ZHOU B L , KHOSLA A , LAPEDRIZA A ,et al. Learning deep features for discriminative localization[C]// Proceedings of 2016 IEEE Conference on Computer Vision and Pattern Recognition. 2016: 2921-2929.
[7]	易平, 王科迪, 黄程 ,等. 人工智能对抗攻击研究综述[J]. 上海交通大学学报, 2018,52(10): 1298-1306.
	YI P , WANG K D , HUANG C ,et al. Adversarial attacks in artificial intelligence:a survey[J]. Journal of Shanghai Jiao Tong University, 2018,52(10): 1298-1306.
[8]	王科迪, 易平 . 人工智能对抗环境下的模型鲁棒性研究综述[J]. 信息安全学报, 2020,5(3): 13-22.
	WANG K D , YI P . A survey on model robustness under adversarial example[J]. Journal of Cyber Security, 2020,5(3): 13-22.
[9]	GOODFELLOW I J , SHLENS J , SZEGEDY C . Explaining and harnessing adversarial examples[J]. arXiv preprint arXiv:1412.6572, 2014.
[10]	KURAKIN A , GOODFELLOW I , BENGIO S . Adversarial machine learning at scale[J]. arXiv preprint arXiv:1611.01236, 2016.
[11]	DONG Y P , LIAO F Z , PANG T Y ,et al. Boosting adversarial attacks with momentum[C]// Proceedings of 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition. 2018: 9185-9193.
[12]	MADRY A , MAKELOV A , SCHMIDT L ,et al. Towards deep learning models resistant to adversarial attacks[J]. arXiv preprint arXiv:1706.06083, 2017.
[13]	MOOSAVI-DEZFOOLI S M , FAWZI A , FROSSARD P . DeepFool:a simple and accurate method to fool deep neural networks[C]// Proceedings of 2016 IEEE Conference on Computer Vision and Pattern Recognition. 2016: 2574-2582.
[14]	SU J W , VARGAS D V , SAKURAI K . One pixel attack for fooling deep neural networks[J]. IEEE Transactions on Evolutionary Computation, 2019,23(5): 828-841.
[15]	CARLINI N , WAGNER D . Towards evaluating the robustness of neural networks[C]// Proceedings of 2017 IEEE Symposium on Security and Privacy. 2017: 39-57.
[16]	KANNAN H , KURAKIN A , GOODFELLOW I . Adversarial logit pairing[J]. arXiv preprint arXiv:1803.06373, 2018.
[17]	WANG Y , ZOU D , YI J ,et al. Improving adversarial robustness requires revisiting misclassified examples[C]// Proceedings of International Conference on Learning Representations. 2019.
[18]	XIE C , WANG J , ZHANG Z ,et al. Mitigating adversarial effects through randomization[J]. arXiv preprint arXiv:1711.01991, 2017.
[19]	XIE C H , WU Y X , VAN DER MAATEN L ,et al. Feature denoising for improving adversarial robustness[C]// Proceedings of 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). 2019: 501-509.
[20]	XU W L , EVANS D , QI Y J . Feature squeezing:detecting adversarial examples in deep neural networks[J]. arXiv preprint arXiv:1704.01155, 2017.
[21]	MA S Q , LIU Y Q , TAO G H ,et al. Nic:detecting adversarial samples with neural network invariant checking[C]// Proceedings of the 26th Network and Distributed System Security Symposium (NDSS 2019). 2019.
[22]	GOLDBLUM M , FOWL L , FEIZI S ,et al. Adversarially robust distillation[J]. Proceedings of the AAAI Conference on Artificial Intelligence, 2020,34(4): 3996-4003.
[23]	SHAFAHI A , SAADATPANAH P , ZHU C ,et al. Adversarially robust transfer learning[J]. arXiv preprint arXiv:1905.08232, 2019.
[24]	SHAFAHI A , NAJIBI M , GHIASI A ,et al. Adversarial training for free![J]. arXiv preprint arXiv:1904.12843, 2019.
[25]	SELVARAJU R R , COGSWELL M , DAS A ,et al. Grad-CAM:visual explanations from deep networks via gradient-based localization[J]. International Journal of Computer Vision, 2020,128(2): 336-359.
[26]	CHATTOPADHAY A , SARKAR A , HOWLADER P ,et al. Grad-CAM++:generalized gradient-based visual explanations for deep convolutional networks[C]// Proceedings of 2018 IEEE Winter Conference on Applications of Computer Vision. 2018: 839-847.
[27]	WANG Y D , ZHANG J , KAN M N ,et al. Self-supervised equivariant attention mechanism for weakly supervised semantic segmentation[C]// Proceedings of 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). 2020: 12272-12281.
[28]	LIN M , CHEN Q , YAN S . Network in network[J]. arXiv preprint arXiv:1312.4400, 2013.
[29]	BOTTOU L , . Large-scale machine learning with stochastic gradient descent[C]// Proceedings of COMPSTAT'2010. 2010.

Metrics

Recommended 0

No Suggested Reading articles found!

Adversarial examples defense method based on multi-dimensional feature maps knowledge distillation

RichHTML

PDF下载

Knowledge

Abstract

Cite this article

share this article

Figures/Tables 15

References 29

Related Articles 15

Metrics

Recommended 0

[1]	Xiaomeng LI, Daidou GUO, Xunfang ZHUO, Heng YAO, Chuan QIN. Carrier-independent screen-shooting resistant watermarking based on information overlay superimposition [J]. Chinese Journal of Network and Information Security, 2023, 9(3): 135-149.
[2]	Rongna XIE, Zhuhong MA, Zongyu LI, Ye TIAN. Encrypted traffic classification method based on convolutional neural network [J]. Chinese Journal of Network and Information Security, 2022, 8(6): 84-91.
[3]	Dengyong ZHANG, Huang WEN, Feng LI, Peng CAO, Lingyun XIANG, Gaobo YANG, Xiangling DING. Image inpainting forensics method based on dual branch network [J]. Chinese Journal of Network and Information Security, 2022, 8(6): 110-122.
[4]	Jiaying LIN, Wenbo ZHOU, Weiming ZHANG, Nenghai YU. Lip forgery detection via spatial-frequency domain combination [J]. Chinese Journal of Network and Information Security, 2022, 8(6): 146-155.
[5]	Jinyin CHEN, Changan WU, Haibin ZHENG. Novel defense based on softmax activation transformation [J]. Chinese Journal of Network and Information Security, 2022, 8(2): 48-63.
[6]	Lijuan LI, Man LI, Hongjun BI, Huachun ZHOU. Multi-type low-rate DDoS attack detection method based on hybrid deep learning [J]. Chinese Journal of Network and Information Security, 2022, 8(1): 73-85.
[7]	Zhongyuan QIN, Zhaoxiang HE, Tao LI, Liquan CHEN. Adversarial example defense algorithm for MNIST based on image reconstruction [J]. Chinese Journal of Network and Information Security, 2022, 8(1): 86-94.
[8]	Deqing ZOU, Xiang LI, Minhuan HUANG, Xiang SONG, Hao LI, Weiming LI. Intelligent vulnerability detection system based on graph structured source code slice [J]. Chinese Journal of Network and Information Security, 2021, 7(5): 113-122.
[9]	Zhenglong WANG, Baowen ZHANG. Survey of generative adversarial network [J]. Chinese Journal of Network and Information Security, 2021, 7(4): 68-85.
[10]	Binglong LI, Jinlong TONG, Yu ZHANG, Yifeng SUN, Qingxian WANG, Chaowen CHANG. Auto forensic detecting algorithms of malicious code fragment based on TensorFlow [J]. Chinese Journal of Network and Information Security, 2021, 7(4): 154-163.
[11]	Qingyin TAN, Yingming ZENG, Ye HAN, Yijing LIU, Zheli LIU. Survey on backdoor attacks targeted on neural network [J]. Chinese Journal of Network and Information Security, 2021, 7(3): 46-58.
[12]	Luhui YANG,Huiwen BAI,Guangjie LIU,Yuewei DAI. Lightweight malicious domain name detection model based on separable convolution [J]. Chinese Journal of Network and Information Security, 2020, 6(6): 112-120.
[13]	Ximeng LIU,Lehui XIE,Yaopeng WANG,Xuru LI. Adversarial attacks and defenses in deep learning [J]. Chinese Journal of Network and Information Security, 2020, 6(5): 36-53.
[14]	Sijia DU,Haining YU,Hongli ZHANG. Survey of text classification methods based on deep learning [J]. Chinese Journal of Network and Information Security, 2020, 6(4): 1-13.
[15]	Mingfang ZHAI,Xingming ZHANG,Bo ZHAO. Survey of encrypted malicious traffic detection based on deep learning [J]. Chinese Journal of Network and Information Security, 2020, 6(3): 66-77.