网络与信息安全学报 ›› 2022, Vol. 8 ›› Issue (2): 48-63.doi: 10.11959/j.issn.2096-109x.2022016
• 专栏:网络攻击与防御技术 • 上一篇
陈晋音1,2, 吴长安2, 郑海斌2
修回日期:
2021-11-23
出版日期:
2022-04-01
发布日期:
2022-04-01
作者简介:
陈晋音(1982− ),女,浙江象山人,浙江工业大学教授,主要研究方向为人工智能安全、图数据挖掘和进化计算基金资助:
Jinyin CHEN1,2, Changan WU2, Haibin ZHENG2
Revised:
2021-11-23
Online:
2022-04-01
Published:
2022-04-01
Supported by:
摘要:
深度学习广泛应用于图像处理、自然语言处理、网络挖掘等各个领域并取得良好效果,但其容易受到对抗攻击、存在安全漏洞的问题引起广泛关注。目前已有一些有效的防御方法,包括对抗训练、数据变化、模型增强等方法。但是,依然存在一些问题,如提前已知攻击方法与对抗样本才能实现有效防御、面向黑盒攻击的防御能力差、以牺牲部分正常样本的处理性能为代价、防御性能无法验证等。因此,提出可验证的、对抗样本不依赖的防御方法是关键。提出了 softmax 激活变换防御(SAT,softmax activation transformation),这是一种针对黑盒攻击的轻量级的快速防御。SAT不参与模型的训练,在推理阶段对目标模型的输出概率进行隐私保护加固并重新激活,通过softmax激活变换与深度模型防御的连接定义,证明通过softmax函数的变换后能实现概率信息的隐私保护从而防御黑盒攻击。SAT的实现不依赖对抗攻击方法与对抗样本,不仅避免了制作大量对抗样本的负担,也实现了攻击的事前防御。通过理论证明 SAT 的激活具有单调性,从而保证其防御过程中正常样本的识别准确率。在激活过程中,提出可变的softmax激活函数变换系数保护策略,在给定范围内随机选择隐私保护变换系数实现动态防御。最重要的一点,SAT 是一种可验证的防御,能够基于概率信息隐私保护和softmax激活变换推导其防御的有效性和可靠性。为了评估SAT的有效性,在MNIST、CIFAR10和ImageNet数据集上进行了针对9种黑盒攻击的防御实验,令所有攻击方法的平均攻击成功率从 87.06%降低为 5.94%,与多种先进黑盒攻击防御方法比较,验证了所提方法可以达到最优防御性能。
中图分类号:
陈晋音, 吴长安, 郑海斌. 基于softmax激活变换的对抗防御方法[J]. 网络与信息安全学报, 2022, 8(2): 48-63.
Jinyin CHEN, Changan WU, Haibin ZHENG. Novel defense based on softmax activation transformation[J]. Chinese Journal of Network and Information Security, 2022, 8(2): 48-63.
表2
基于梯度估计的黑盒攻击防御效果对比Table 2 Comparison of black box attack defense effects based on gradient estimation"
模型 | MNIST(ASR) | CIFAR10(ASR) | ImageNet(ASR) | ||||||||
ZOO (10 000) | Auto-ZOO (1 000) | Boundary++(5 000) | ZOO (5 000) | Auto-ZOO (1 000) | Boundary++(5 000) | ZOO (250 000) | Auto-ZOO (5 000) | Boundary++(80 000) | |||
原始模型 | 99.57% | 100.00% | 100.00% | 97.42% | 100.00% | 100.00% | 89.31% | 100.00% | 84.54% | ||
AdvTrain | 28.56% | 30.18% | 35.73% | 27.76% | 28.10% | 30.12% | 21.36% | 25.54% | 24.79% | ||
Transform | 23.52% | 24.56% | 29.25% | 23.93% | 24.43% | 22.65% | 14.31% | 16.96% | 15.62% | ||
Ensemble | 20.49% | 25.63% | 19.54% | 23.15% | 20.53% | 17.53% | 14.49% | 13.84% | 10.89% | ||
RSE | 13.43% | 14.15% | 9.43% | 8.59% | 12.61% | 10.75% | 8.31% | 9.62% | 3.63% | ||
SAT | 6.42% | 9.74% | 8.83% | 7.93% | 8.94% | 7.53% | 7.65% | 8.32% | 6.38% |
表3
基于模型等价的黑盒攻击防御效果对比Table 3 Comparison of black box attack defense effects based on model equivalence"
模型 | MNIST(ASR) | CIFAR10(ASR) | ImageNet(ASR) | ||||||||
FGSM | MI-FGSM | C&W | FGSM | MI-FGSM | C&W | FGSM | MI-FGSM | C&W | |||
原始模型 | 75.83% | 84.38% | 72.75% | 73.27% | 86.84% | 70.53% | 69.94% | 80.72% | 67.52% | ||
AdvTrain | 23.57% | 29.47% | 35.38% | 21.52% | 29.99% | 35.41% | 16.10% | 34.27% | 30.63% | ||
Transform | 37.84% | 44.42% | 35.45% | 33.40% | 46.13% | 31.14% | 30.92% | 43.71% | 22.64% | ||
Ensemble | 30.45% | 39.72% | 28.42% | 31.48% | 40.25% | 32.29% | 25.55% | 38.12% | 23.41% | ||
RSE | 18.83% | 28.31% | 16.37% | 17.37% | 25.58% | 16.91% | 15.82% | 27.96% | 12.92% | ||
SAT | 3.97% | 6.14% | 4.32% | 3.32% | 6.54% | 2.57% | 3.37% | 5.41% | 2.25% |
表4
基于概率优化的黑盒攻击防御效果对比Table 4 Comparison of black box attack defense effects based on probability optimization"
模型 | MNIST(ASR) | CIFAR10(ASR) | ImageNet(ASR) | ||||||||
One-pixel (80 000) | NES(PI) (2 000) | POBA-GA (1 000) | One-pixel (80 000) | NES(PI) (2 000) | POBA-GA (1 000) | One-pixel (800 000) | NES(PI) (50 000) | POBA-GA (5 000) | |||
原始模型 | 80.32% | 99.26% | 100.00% | 86.43% | 100.00% | 100.00% | 40.61% | 93.60% | 98.00% | ||
AdvTrain | 30.12% | 33.81% | 41.86% | 35.29% | 38.30% | 38.52% | 9.43% | 32.96% | 38.10% | ||
Transform | 4.54% | 25.48% | 20.55% | 8.19% | 24.88% | 19.96% | 2.48% | 28.90% | 18.54% | ||
Ensemble | 13.65% | 26.73% | 27.43% | 15.3% | 23.65% | 26.92% | 8.54% | 22.49% | 23.95% | ||
RSE | 1.43% | 15.54% | 8.93% | 7.69% | 20.52% | 16.49% | 2.43% | 18.97% | 10.80% | ||
SAT | 3.69% | 7.31% | 6.32% | 3.91% | 6.11% | 7.32% | 4.56% | 5.23% | 6.43% |
表5
SAT与其他防御方法的防御代价对比Table 5 Comparison of defense costs between SAT and other defense methods"
模型 | MNIST | CIFAR10 | ImageNet | ||||||||
ACC | TTrain/min | TTest/s | ACC | TTrain/min | TTest/s | ACC | TTrain/h | TTest/s | |||
原始模型 | 100.00% | 10.34 | 24.54 | 90.49% | 24.13 | 24.18 | 76.53% | 26.45 | 42.46 | ||
AdvTrain | 98.88% | 24.35 | 25.36 | 82.51% | 49.87 | 24.66 | 70.59% | 50.53 | 43.52 | ||
Transform | 99.96% | 12.39 | 26.16 | 88.64% | 25.21 | 25.31 | 72.18% | 28.47 | 48.38 | ||
Ensemble | 99.98% | 30.24 | 41.54 | 92.87% | 62.59 | 42.10 | 76.94% | 61.87 | 130.21 | ||
RSE | 98.83% | 26.85 | 37.63 | 87.58% | 57.37 | 39.51 | 66.48% | 56.40 | 75.19 | ||
SAT | 100.00% | 10.34 | 25.18 | 90.49% | 24.13 | 24 .01 | 76.53% | 26.45 | 42.79 |
[1] | GOODFELLOW I , BENGIO Y , COURVILLE A . Deep learning[M]. MIT Press, 2016. |
[2] | HELMSTAEDTER M , BRIGGMAN K L , TURAGA S C ,et al. Connectomic reconstruction of the inner plexiform layer in the mouse retina[J]. Nature, 2013,500(7461): 168-174. |
[3] | XIONG H Y , ALIPANAHI B , LEE L J ,et al. The human splicing code reveals new insights into the genetic determinants of disease[J]. Science, 2015,347(6218): 1254806-1254815. |
[4] | HINTON G E , DENG L , YU D ,et al. Deep neural networks for acoustic modeling in speech recognition:The shared views of four research groups[J]. IEEE Signal Processing Magazine, 2012,29(6): 82-97. |
[5] | SUTSKEVER I , VINYALS O , LE Q V ,et al. Sequence to sequence learning with neural networks[J]. Advances in neural information processing systems, 2014,4(2): 3104-3112. |
[6] | KRIZHEVSKY A , SUTSKEVER I , HINTON G E ,et al. ImageNet classification with deep convolutional neural networks[J]. Advances in Neural Information Processing Systems, 2012,2: 1097-1105. |
[7] | DENG J , DONG W , SOCHER R ,et al. ImageNet:a large-scale hierarchical image database[C]// IEEE Conference on Computer Vision and Pattern Recognition. 2009: 248-255. |
[8] | 田野, 项世军 . 基于 LBP 和多层 DCT 的人脸活体检测算法[J]. 计算机研究与发展, 2018,55(3): 643-650. |
TIAN Y , XIANG S J . LBP and multilayer DCT based anti-spoofing countermeasure in face liveness detection[J]. Journal of Computer Research and Development, 2018,55(3): 643-650. | |
[9] | 张蕊, 李锦涛 . 基于深度学习的场景分割算法研究综述[J]. 计算机研究与发展, 2020,57(4): 859-875. |
ZHANG R , LI J T . A survey on algorithm research of scene parsing based on deep learning[J]. Journal of Computer Research and De-velopment, 2020,57(4): 859-875. | |
[10] | NAJAFABADI M M , VILLANUSTRE F , KHOSHGOFTAAR T M ,et al. Deep learning applications and challenges in big data analytics[J]. Journal of Big Data, 2015,2(1): 1-21. |
[11] | PAPERNOT N , MCDANIEL P , SINHA A ,et al. Towards the science of security and privacy in machine learning[J]. arXiv preprint arXiv:1611.03814, 2016. |
[12] | MNIH V , KAVUKCUOGLU K , SILVER D ,et al. Human-level control through deep reinforcement learning[J]. Nature, 2015,518(7540): 529-533. |
[13] | SZEGEDY C , ZAREMBA W , SUTSKEVER I ,et al. Intriguing properties of neural networks[C]// Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition(CVPR). 2013: 7185-7193. |
[14] | GOODFELLOW I J , SHLENS J , SZEGEDY C . Explaining and harnessing adversarial examples[J]. arXiv preprint arXiv:1412.6572, 2014. |
[15] | DONG Y , LIAO F , PANG T ,et al. Boosting adversarial attacks with momentum[C]// /Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition(CVPR). 2018: 9185-9193. |
[16] | MOOSAVI-DEZFOOLI S M , FAWZI A , FAWZI O ,et al. Universal adversarial perturbations[C]// /Proceedings of IEEE Conference on Computer Vision and Pattern Recognition (CVPR). 2017: 1765-1773. |
[17] | CARLINI N , WAGNER D . Towards evaluating the robustness of neural networks[C]// /Proceedings of 2017 IEEE European Symposium on Security and Privacy. 2017: 39-57. |
[18] | CHEN J , ZHENG H , XIONG H ,et al. MAG-GAN:massive attack generator via GAN[J]. Information Sciences, 2020,536: 67-90. |
[19] | SU J , VARGAS D V , SAKURAI K . One pixel attack for fooling deep neural networks[J]. IEEE Transactions on Evolutionary Computation, 2019,23(5): 828-841. |
[20] | CHEN P Y , ZHANG H , SHARMA Y ,et al. Zoo:zeroth order optimization based black-box attacks to deep neural networks without training substitute models[C]// /Proceedings of the 10th ACM Workshop on Artificial Intelligence and Security. 2017: 15-26. |
[21] | CHEN J , SU M , SHEN S ,et al. POBA-GA:perturbation optimized black-box adversarial attacks via genetic algorithm[J]. Computers& Security, 2019,85: 89-106. |
[22] | 陈晋音, 陈治清, 郑海斌 ,等. 基于粒子群优化的路牌识别模型的黑盒物理攻击方法[J]. 软件学报, 2020: 1-17. |
CHEN J Y , CHEN Z Q , ZHENG H B ,et al. Black-box physical at-tack against road sign recognition model via PSO[J]. Journal of Software, 2020: 1-17. | |
[23] | 陈晋音, 沈诗婧, 苏蒙蒙 ,等. 车牌识别系统的黑盒对抗攻击[J]. 自动化学报, 2020: 1-18. |
CHEN J Y , SHEN S J , SU M M ,et al. Black-box adversarial attack on license plate recognition system[J]. Acta Automatica Sinica, 2020: 1-18. | |
[24] | BRENDEL W , RAUBER J , BETHGE M . Decision-based adversarial attacks:reliable attacks against black-box machine learning models[J]. arXiv preprint arXiv:1712.04248, 2017. |
[25] | CHEN J , ZHENG H , CHEN R ,et al. RCA-SOC:a novel adversarial defense by refocusing on critical areas and strengthening object contours[J]. Computers & Security, 2020:101916. |
[26] | AKHTAR N , MIAN A . Threat of adversarial attacks on deep learning in computer vision:a survey[J]. IEEE Access, 2018,6: 14410-14430. |
[27] | ALENAZY W M , ALQAHTANI A S . Gravitational search algorithm based optimized deep learning model with diverse set of features for facial expression recognition[J]. Journal of Ambient Intelligence and Humanized Computing, 2021,12(2): 1631-1646. |
[28] | TU C , TING P , CHEN P Y ,et al. Autozoom:Autoencoder-based zeroth order optimization method for attacking black-box neural networks[J]. arXiv preprint arXiv:1805.11770, 2018. |
[29] | CHEN J , JORDAN M I . Boundary attack++:query-efficient decision-based adversarial attack[J]. arXiv preprint arXiv:1904.02144, 2019. |
[30] | PAPERNOT N , MCDANIEL P , GOODFELLOW I ,et al. Practical black-box attacks against machine learning[C]// /Proceedings of the 2017 ACM on Asia Conference on Computer and Communications. 2017: 506-519. |
[31] | ZHANG X , WU D . On the vulnerability of CNN classifiers in EEG-based BCIs[J]. IEEE Transactions on Neural Systems and Rehabilitation Engineering, 2019,27(5): 814-825. |
[32] | SU J , VARGAS D V , SAKURAI K . One pixel attack for fooling deep neural networks[J]. IEEE Transactions on Evolutionary Computation, 2019,23(5): 828-841. |
[33] | ILYAS A , ENGSTROM L , ATHALYE A ,et al. Black-box adversarial attacks with limited queries and information[C]// /Proceedings of 35th International Conference on Machine Learning(ICML). 2018: 3392-3401. |
[34] | AKHTAR N , MIAN A . Threat of adversarial attacks on deep learning in computer vision:a survey[J]. IEEE Access, 2018,6: 14410-14430. |
[35] | GOODFELLOW IJ , SHLENS J , SZEGEDY C . Explaining and harnessing adversarial examples[J]. arXiv preprint arXiv:1412.6572. |
[36] | KURAKIN A , GOODFELLOW I , BENGIO S ,et al. Adversarial machine learning at scale[C]// Proceedings of Computer Vision and Pattern Recognition(CVPR). 2016: 1-13. |
[37] | MADRY A , MAKELOV A , SCHMIDT L ,et al. Towards deep learning models resistant to adversarial attacks[C]// ICLR. 2018. |
[38] | MADRY A , MAKELOV A , SCHMIDT L ,et al. Towards deep learning models resistant to adversarial attacks[J]. arXiv preprint arXiv:1706.06083, 2017. |
[39] | SHAFAHI A , NAJIBI M , GHIASI A ,et al. Adversarial training for free![J]. arXiv preprint arXiv:1904.12843v1. |
[40] | XIE C , WANG J , ZHANG Z ,et al. Adversarial examples for semantic segmentation and object detection[C]// Proc of International Conference on Computer Vision(ICCV). 2017: 1-13. |
[41] | GUO C , RANA M , CISSE M ,et al. Countering adversarial images using input transformations[J]. arXiv preprint arXiv:1711.00117, 2017. |
[42] | STRAUSS T , HANSELMANN M , JUNGINGER A ,et al. Ensemble methods as a defense to adversarial perturbations against deep neural networks[J]. arXiv preprint arXiv:1709.03423, 2017. |
[43] | LIU X , CHENG M , ZHANG H ,et al. Towards robust neural networks via random self-ensemble[C]// /Proceedings of Lecture Notes in Computer Science, 2018,11211: 381-397. |
[1] | 李丽娟, 李曼, 毕红军, 周华春. 基于混合深度学习的多类型低速率DDoS攻击检测方法[J]. 网络与信息安全学报, 2022, 8(1): 73-85. |
[2] | 秦中元, 贺兆祥, 李涛, 陈立全. 基于图像重构的MNIST对抗样本防御算法[J]. 网络与信息安全学报, 2022, 8(1): 86-94. |
[3] | 邹德清, 李响, 黄敏桓, 宋翔, 李浩, 李伟明. 基于图结构源代码切片的智能化漏洞检测系统[J]. 网络与信息安全学报, 2021, 7(5): 113-122. |
[4] | 王正龙, 张保稳. 生成对抗网络研究综述[J]. 网络与信息安全学报, 2021, 7(4): 68-85. |
[5] | 李炳龙, 佟金龙, 张宇, 孙怡峰, 王清贤, 常朝稳. 基于TensorFlow的恶意代码片段自动取证检测算法[J]. 网络与信息安全学报, 2021, 7(4): 154-163. |
[6] | 谭清尹, 曾颖明, 韩叶, 刘一静, 刘哲理. 神经网络后门攻击研究[J]. 网络与信息安全学报, 2021, 7(3): 46-58. |
[7] | 杨路辉,白惠文,刘光杰,戴跃伟. 基于可分离卷积的轻量级恶意域名检测模型[J]. 网络与信息安全学报, 2020, 6(6): 112-120. |
[8] | 刘西蒙,谢乐辉,王耀鹏,李旭如. 深度学习中的对抗攻击与防御[J]. 网络与信息安全学报, 2020, 6(5): 36-53. |
[9] | 杜思佳,于海宁,张宏莉. 基于深度学习的文本分类研究进展[J]. 网络与信息安全学报, 2020, 6(4): 1-13. |
[10] | 翟明芳,张兴明,赵博. 基于深度学习的加密恶意流量检测研究[J]. 网络与信息安全学报, 2020, 6(3): 66-77. |
[11] | 段广晗,马春光,宋蕾,武朋. 深度学习中对抗样本的构造及防御研究[J]. 网络与信息安全学报, 2020, 6(2): 1-11. |
[12] | 王易东, 刘培顺, 王彬. 基于深度学习的系统日志异常检测研究[J]. 网络与信息安全学报, 2019, 5(5): 105-118. |
[13] | 尹赢,吉立新,黄瑞阳,杜立新. 网络表示学习的研究与发展[J]. 网络与信息安全学报, 2019, 5(2): 77-87. |
[14] | 李珍, 邹德清, 王泽丽, 金海. 面向源代码的软件漏洞静态检测综述[J]. 网络与信息安全学报, 2019, 5(1): 1-14. |
[15] | 燕昺昊,韩国栋. 基于深度循环神经网络和改进SMOTE算法的组合式入侵检测模型[J]. 网络与信息安全学报, 2018, 4(7): 48-59. |
阅读次数 | ||||||
全文 |
|
|||||
摘要 |
|
|||||
|