基于softmax激活变换的对抗防御方法

doi:10.11959/j.issn.2096-109x.2022016

摘要/Abstract

摘要：

深度学习广泛应用于图像处理、自然语言处理、网络挖掘等各个领域并取得良好效果，但其容易受到对抗攻击、存在安全漏洞的问题引起广泛关注。目前已有一些有效的防御方法，包括对抗训练、数据变化、模型增强等方法。但是，依然存在一些问题，如提前已知攻击方法与对抗样本才能实现有效防御、面向黑盒攻击的防御能力差、以牺牲部分正常样本的处理性能为代价、防御性能无法验证等。因此，提出可验证的、对抗样本不依赖的防御方法是关键。提出了 softmax 激活变换防御（SAT，softmax activation transformation），这是一种针对黑盒攻击的轻量级的快速防御。SAT不参与模型的训练，在推理阶段对目标模型的输出概率进行隐私保护加固并重新激活，通过softmax激活变换与深度模型防御的连接定义，证明通过softmax函数的变换后能实现概率信息的隐私保护从而防御黑盒攻击。SAT的实现不依赖对抗攻击方法与对抗样本，不仅避免了制作大量对抗样本的负担，也实现了攻击的事前防御。通过理论证明 SAT 的激活具有单调性，从而保证其防御过程中正常样本的识别准确率。在激活过程中，提出可变的softmax激活函数变换系数保护策略，在给定范围内随机选择隐私保护变换系数实现动态防御。最重要的一点，SAT 是一种可验证的防御，能够基于概率信息隐私保护和softmax激活变换推导其防御的有效性和可靠性。为了评估SAT的有效性，在MNIST、CIFAR10和ImageNet数据集上进行了针对9种黑盒攻击的防御实验，令所有攻击方法的平均攻击成功率从 87.06%降低为 5.94%，与多种先进黑盒攻击防御方法比较，验证了所提方法可以达到最优防御性能。

关键词: 深度学习, 对抗防御, 可验证, 攻击无关

Abstract:

Deep learning is widely used in various fields such as image processing, natural language processing, network mining and so on.However, it is vulnerable to malicious adversarial attacks and many defensive methods have been proposed accordingly.Most defense methods are attack-dependent and require defenders to generate massive adversarial examples in advance.The defense cost is high and it is difficult to resist black-box attacks.Some of these defenses even affect the recognition of normal examples.In addition, the current defense methods are mostly empirical, without certifiable theoretical support.Softmax activation transformation (SAT) was proposed in this paper, which was a light-weight and fast defense scheme against black-box attacks.SAT reactivates the output probability of the target model in the testing phase, and then it guarantees privacy of the probability information.As an attack-free defense, SAT not only avoids the burden of generating massive adversarial examples, but also realizes the advance defense of attacks.The activation of SAT is monotonic, so it will not affect the recognition of normal examples.During the activation process, a variable privacy protection transformation coefficient was designed to achieve dynamic defense.Above all, SAT is a certifiable defense that can derive the effectiveness and reliability of its defense based on softmax activation transformation.To evaluate the effectiveness of SAT, defense experiments against 9 attacks on MNIST, CIFAR10 and ImageNet datasets were conducted, and the average attack success rate was reduced from 87.06% to 5.94%.

Key words: deep learning, adversarial defense, certifiable, attack-free

中图分类号:

TP181

陈晋音, 吴长安, 郑海斌. 基于softmax激活变换的对抗防御方法[J]. 网络与信息安全学报, 2022, 8(2): 48-63.

Jinyin CHEN, Changan WU, Haibin ZHENG. Novel defense based on softmax activation transformation[J]. Chinese Journal of Network and Information Security, 2022, 8(2): 48-63.

图/表 11

表1

图1

图2

图3

图4

表2

表3

表4

表5

图5

图6

参考文献 43

[1]	GOODFELLOW I , BENGIO Y , COURVILLE A . Deep learning[M]. MIT Press, 2016.
[2]	HELMSTAEDTER M , BRIGGMAN K L , TURAGA S C ,et al. Connectomic reconstruction of the inner plexiform layer in the mouse retina[J]. Nature, 2013,500(7461): 168-174.
[3]	XIONG H Y , ALIPANAHI B , LEE L J ,et al. The human splicing code reveals new insights into the genetic determinants of disease[J]. Science, 2015,347(6218): 1254806-1254815.
[4]	HINTON G E , DENG L , YU D ,et al. Deep neural networks for acoustic modeling in speech recognition:The shared views of four research groups[J]. IEEE Signal Processing Magazine, 2012,29(6): 82-97.
[5]	SUTSKEVER I , VINYALS O , LE Q V ,et al. Sequence to sequence learning with neural networks[J]. Advances in neural information processing systems, 2014,4(2): 3104-3112.
[6]	KRIZHEVSKY A , SUTSKEVER I , HINTON G E ,et al. ImageNet classification with deep convolutional neural networks[J]. Advances in Neural Information Processing Systems, 2012,2: 1097-1105.
[7]	DENG J , DONG W , SOCHER R ,et al. ImageNet:a large-scale hierarchical image database[C]// IEEE Conference on Computer Vision and Pattern Recognition. 2009: 248-255.
[8]	田野, 项世军 . 基于 LBP 和多层 DCT 的人脸活体检测算法[J]. 计算机研究与发展, 2018,55(3): 643-650.
	TIAN Y , XIANG S J . LBP and multilayer DCT based anti-spoofing countermeasure in face liveness detection[J]. Journal of Computer Research and Development, 2018,55(3): 643-650.
[9]	张蕊, 李锦涛 . 基于深度学习的场景分割算法研究综述[J]. 计算机研究与发展, 2020,57(4): 859-875.
	ZHANG R , LI J T . A survey on algorithm research of scene parsing based on deep learning[J]. Journal of Computer Research and De-velopment, 2020,57(4): 859-875.
[10]	NAJAFABADI M M , VILLANUSTRE F , KHOSHGOFTAAR T M ,et al. Deep learning applications and challenges in big data analytics[J]. Journal of Big Data, 2015,2(1): 1-21.
[11]	PAPERNOT N , MCDANIEL P , SINHA A ,et al. Towards the science of security and privacy in machine learning[J]. arXiv preprint arXiv:1611.03814, 2016.
[12]	MNIH V , KAVUKCUOGLU K , SILVER D ,et al. Human-level control through deep reinforcement learning[J]. Nature, 2015,518(7540): 529-533.
[13]	SZEGEDY C , ZAREMBA W , SUTSKEVER I ,et al. Intriguing properties of neural networks[C]// Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition(CVPR). 2013: 7185-7193.
[14]	GOODFELLOW I J , SHLENS J , SZEGEDY C . Explaining and harnessing adversarial examples[J]. arXiv preprint arXiv:1412.6572, 2014.
[15]	DONG Y , LIAO F , PANG T ,et al. Boosting adversarial attacks with momentum[C]// /Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition(CVPR). 2018: 9185-9193.
[16]	MOOSAVI-DEZFOOLI S M , FAWZI A , FAWZI O ,et al. Universal adversarial perturbations[C]// /Proceedings of IEEE Conference on Computer Vision and Pattern Recognition (CVPR). 2017: 1765-1773.
[17]	CARLINI N , WAGNER D . Towards evaluating the robustness of neural networks[C]// /Proceedings of 2017 IEEE European Symposium on Security and Privacy. 2017: 39-57.
[18]	CHEN J , ZHENG H , XIONG H ,et al. MAG-GAN:massive attack generator via GAN[J]. Information Sciences, 2020,536: 67-90.
[19]	SU J , VARGAS D V , SAKURAI K . One pixel attack for fooling deep neural networks[J]. IEEE Transactions on Evolutionary Computation, 2019,23(5): 828-841.
[20]	CHEN P Y , ZHANG H , SHARMA Y ,et al. Zoo:zeroth order optimization based black-box attacks to deep neural networks without training substitute models[C]// /Proceedings of the 10th ACM Workshop on Artificial Intelligence and Security. 2017: 15-26.
[21]	CHEN J , SU M , SHEN S ,et al. POBA-GA:perturbation optimized black-box adversarial attacks via genetic algorithm[J]. Computers＆ Security, 2019,85: 89-106.
[22]	陈晋音, 陈治清, 郑海斌 ,等. 基于粒子群优化的路牌识别模型的黑盒物理攻击方法[J]. 软件学报, 2020: 1-17.
	CHEN J Y , CHEN Z Q , ZHENG H B ,et al. Black-box physical at-tack against road sign recognition model via PSO[J]. Journal of Software, 2020: 1-17.
[23]	陈晋音, 沈诗婧, 苏蒙蒙 ,等. 车牌识别系统的黑盒对抗攻击[J]. 自动化学报, 2020: 1-18.
	CHEN J Y , SHEN S J , SU M M ,et al. Black-box adversarial attack on license plate recognition system[J]. Acta Automatica Sinica, 2020: 1-18.
[24]	BRENDEL W , RAUBER J , BETHGE M . Decision-based adversarial attacks:reliable attacks against black-box machine learning models[J]. arXiv preprint arXiv:1712.04248, 2017.
[25]	CHEN J , ZHENG H , CHEN R ,et al. RCA-SOC:a novel adversarial defense by refocusing on critical areas and strengthening object contours[J]. Computers ＆ Security, 2020:101916.
[26]	AKHTAR N , MIAN A . Threat of adversarial attacks on deep learning in computer vision:a survey[J]. IEEE Access, 2018,6: 14410-14430.
[27]	ALENAZY W M , ALQAHTANI A S . Gravitational search algorithm based optimized deep learning model with diverse set of features for facial expression recognition[J]. Journal of Ambient Intelligence and Humanized Computing, 2021,12(2): 1631-1646.
[28]	TU C , TING P , CHEN P Y ,et al. Autozoom:Autoencoder-based zeroth order optimization method for attacking black-box neural networks[J]. arXiv preprint arXiv:1805.11770, 2018.
[29]	CHEN J , JORDAN M I . Boundary attack++:query-efficient decision-based adversarial attack[J]. arXiv preprint arXiv:1904.02144, 2019.
[30]	PAPERNOT N , MCDANIEL P , GOODFELLOW I ,et al. Practical black-box attacks against machine learning[C]// /Proceedings of the 2017 ACM on Asia Conference on Computer and Communications. 2017: 506-519.
[31]	ZHANG X , WU D . On the vulnerability of CNN classifiers in EEG-based BCIs[J]. IEEE Transactions on Neural Systems and Rehabilitation Engineering, 2019,27(5): 814-825.
[32]	SU J , VARGAS D V , SAKURAI K . One pixel attack for fooling deep neural networks[J]. IEEE Transactions on Evolutionary Computation, 2019,23(5): 828-841.
[33]	ILYAS A , ENGSTROM L , ATHALYE A ,et al. Black-box adversarial attacks with limited queries and information[C]// /Proceedings of 35th International Conference on Machine Learning(ICML). 2018: 3392-3401.
[34]	AKHTAR N , MIAN A . Threat of adversarial attacks on deep learning in computer vision:a survey[J]. IEEE Access, 2018,6: 14410-14430.
[35]	GOODFELLOW IJ , SHLENS J , SZEGEDY C . Explaining and harnessing adversarial examples[J]. arXiv preprint arXiv:1412.6572.
[36]	KURAKIN A , GOODFELLOW I , BENGIO S ,et al. Adversarial machine learning at scale[C]// Proceedings of Computer Vision and Pattern Recognition(CVPR). 2016: 1-13.
[37]	MADRY A , MAKELOV A , SCHMIDT L ,et al. Towards deep learning models resistant to adversarial attacks[C]// ICLR. 2018.
[38]	MADRY A , MAKELOV A , SCHMIDT L ,et al. Towards deep learning models resistant to adversarial attacks[J]. arXiv preprint arXiv:1706.06083, 2017.
[39]	SHAFAHI A , NAJIBI M , GHIASI A ,et al. Adversarial training for free![J]. arXiv preprint arXiv:1904.12843v1.
[40]	XIE C , WANG J , ZHANG Z ,et al. Adversarial examples for semantic segmentation and object detection[C]// Proc of International Conference on Computer Vision(ICCV). 2017: 1-13.
[41]	GUO C , RANA M , CISSE M ,et al. Countering adversarial images using input transformations[J]. arXiv preprint arXiv:1711.00117, 2017.
[42]	STRAUSS T , HANSELMANN M , JUNGINGER A ,et al. Ensemble methods as a defense to adversarial perturbations against deep neural networks[J]. arXiv preprint arXiv:1709.03423, 2017.
[43]	LIU X , CHENG M , ZHANG H ,et al. Towards robust neural networks via random self-ensemble[C]// /Proceedings of Lecture Notes in Computer Science, 2018,11211: 381-397.

描述	白盒攻击	黑盒攻击
对手可知信息	所有的信息：模型的结构、参数、训练时的数据集等	有限的信息：只能通过查询获得模型对探测性输入的输出结果
攻击策略	通过输入关于损失函数求导所获取的梯度信息生成对抗样本	通过等价模型、梯度估算、概率优化、决策选择等策略获取对抗样本
用途	用于实验室环境研究或者内部人员进行模型漏洞检测	对现流行的预训练模型或者提供 API 接口的商用模型进行信息窃取或性能破坏

模型	MNIST（ASR）			CIFAR10（ASR）			ImageNet（ASR）
模型	ZOO (10 000)	Auto-ZOO (1 000)	Boundary++(5 000)	ZOO (5 000)	Auto-ZOO (1 000)	Boundary++(5 000)	ZOO (250 000)	Auto-ZOO (5 000)	Boundary++(80 000)
原始模型	99.57%	100.00%	100.00%	97.42%	100.00%	100.00%	89.31%	100.00%	84.54%
AdvTrain	28.56%	30.18%	35.73%	27.76%	28.10%	30.12%	21.36%	25.54%	24.79%
Transform	23.52%	24.56%	29.25%	23.93%	24.43%	22.65%	14.31%	16.96%	15.62%
Ensemble	20.49%	25.63%	19.54%	23.15%	20.53%	17.53%	14.49%	13.84%	10.89%
RSE	13.43%	14.15%	9.43%	8.59%	12.61%	10.75%	8.31%	9.62%	3.63%
SAT	6.42%	9.74%	8.83%	7.93%	8.94%	7.53%	7.65%	8.32%	6.38%

模型	MNIST（ASR）			CIFAR10（ASR）			ImageNet（ASR）
模型	FGSM	MI-FGSM	C＆W	FGSM	MI-FGSM	C＆W	FGSM	MI-FGSM	C＆W
原始模型	75.83%	84.38%	72.75%	73.27%	86.84%	70.53%	69.94%	80.72%	67.52%
AdvTrain	23.57%	29.47%	35.38%	21.52%	29.99%	35.41%	16.10%	34.27%	30.63%
Transform	37.84%	44.42%	35.45%	33.40%	46.13%	31.14%	30.92%	43.71%	22.64%
Ensemble	30.45%	39.72%	28.42%	31.48%	40.25%	32.29%	25.55%	38.12%	23.41%
RSE	18.83%	28.31%	16.37%	17.37%	25.58%	16.91%	15.82%	27.96%	12.92%
SAT	3.97%	6.14%	4.32%	3.32%	6.54%	2.57%	3.37%	5.41%	2.25%

模型	MNIST（ASR）			CIFAR10（ASR）			ImageNet（ASR）
模型	One-pixel (80 000)	NES(PI) (2 000)	POBA-GA (1 000)	One-pixel (80 000)	NES(PI) (2 000)	POBA-GA (1 000)	One-pixel (800 000)	NES(PI) (50 000)	POBA-GA (5 000)
原始模型	80.32%	99.26%	100.00%	86.43%	100.00%	100.00%	40.61%	93.60%	98.00%
AdvTrain	30.12%	33.81%	41.86%	35.29%	38.30%	38.52%	9.43%	32.96%	38.10%
Transform	4.54%	25.48%	20.55%	8.19%	24.88%	19.96%	2.48%	28.90%	18.54%
Ensemble	13.65%	26.73%	27.43%	15.3%	23.65%	26.92%	8.54%	22.49%	23.95%
RSE	1.43%	15.54%	8.93%	7.69%	20.52%	16.49%	2.43%	18.97%	10.80%
SAT	3.69%	7.31%	6.32%	3.91%	6.11%	7.32%	4.56%	5.23%	6.43%

模型	MNIST			CIFAR10			ImageNet
模型	ACC	T_Train/min	T_Test/s	ACC	T_Train/min	T_Test/s	ACC	T_Train/h	T_Test/s
原始模型	100.00%	10.34	24.54	90.49%	24.13	24.18	76.53%	26.45	42.46
AdvTrain	98.88%	24.35	25.36	82.51%	49.87	24.66	70.59%	50.53	43.52
Transform	99.96%	12.39	26.16	88.64%	25.21	25.31	72.18%	28.47	48.38
Ensemble	99.98%	30.24	41.54	92.87%	62.59	42.10	76.94%	61.87	130.21
RSE	98.83%	26.85	37.63	87.58%	57.37	39.51	66.48%	56.40	75.19
SAT	100.00%	10.34	25.18	90.49%	24.13	24 .01	76.53%	26.45	42.79