深度学习中的对抗攻击与防御

doi:10.11959/j.issn.2096-109x.2020071

网络与信息安全学报 ›› 2020, Vol. 6 ›› Issue (5): 36-53.doi: 10.11959/j.issn.2096-109x.2020071

深度学习中的对抗攻击与防御

刘西蒙^1,²(),谢乐辉¹,王耀鹏¹,李旭如³

¹ 福州大学数学与计算机科学学院，福建福州 350108
² 广东省数据安全与隐私保护重点实验室，广东广州 510632
³ 华东师范大学计算机与科学学院，上海 200241

修回日期:2020-05-12 出版日期:2020-10-15 发布日期:2020-10-19
作者简介:刘西蒙（1988- ），男，陕西西安人，博士，福州大学研究员、博士生导师，主要研究方向为隐私计算、密文数据挖掘、大数据隐私保护、可搜索加密|谢乐辉（1997- ），男，福建建瓯人，福州大学硕士生，主要研究方向为人工智能安全|王耀鹏（1995- ），男，福建泉州人，福州大学硕士生，主要研究方向为人工智能安全|李旭如（1995- ），女，安徽宣城人，华东师范大学博士生，主要研究方向为无线通信网络、网络空间安全
基金资助:
国家自然科学基金(U1804263);国家自然科学基金(61702105);广东省数据安全与隐私保护重点实验室开放项目(2017B030301004-12);陕西省重点研发项目(2019KW-053)

Adversarial attacks and defenses in deep learning

Ximeng LIU^1,²(),Lehui XIE¹,Yaopeng WANG¹,Xuru LI³

¹ College of Mathematics and Computer Science,Fuzhou University,Fuzhou 350108,China
² Guangdong Provincial Key Laboratory of Data Security and Privacy Protection,Guangzhou 510632,China
³ School of Computer Science and Technology,East China Normal University,Shanghai 200241,China

Revised:2020-05-12 Online:2020-10-15 Published:2020-10-19
Supported by:
The National Natural Science Foundation of China(U1804263);The National Natural Science Foundation of China(61702105);Opening Project of Guangdong Provincial Key Laboratory of Data Security and Privacy Protection(2017B030301004-12);The Key Research and Development Program of Shaanxi Province,China(2019KW-053)

摘要/Abstract

摘要：

对抗样本是被添加微小扰动的原始样本，用于误导深度学习模型的输出决策，严重威胁到系统的可用性，给系统带来极大的安全隐患。为此，详细分析了当前经典的对抗攻击手段，主要包括白盒攻击和黑盒攻击。根据对抗攻击和防御的发展现状，阐述了近年来国内外的相关防御策略，包括输入预处理、提高模型鲁棒性、恶意检测。最后，给出了未来对抗攻击与防御领域的研究方向。

关键词: 对抗样本, 对抗攻击, 对抗防御, 深度学习安全

Abstract:

The adversarial example is a modified image that is added imperceptible perturbations,which can make deep neural networks decide wrongly.The adversarial examples seriously threaten the availability of the system and bring great security risks to the system.Therefore,the representative adversarial attack methods were analyzed,including white-box attacks and black-box attacks.According to the development status of adversarial attacks and defenses,the relevant domestic and foreign defense strategies in recent years were described,including pre-processing,improving model robustness,malicious detection.Finally,future research directions in the field of adversarial attacks and adversarial defenses were given.

Key words: adversarial examples, adversarial attacks, adversarial defenses, deep learning security

中图分类号:

TP18

刘西蒙,谢乐辉,王耀鹏,李旭如. 深度学习中的对抗攻击与防御[J]. 网络与信息安全学报, 2020, 6(5): 36-53.

Ximeng LIU,Lehui XIE,Yaopeng WANG,Xuru LI. Adversarial attacks and defenses in deep learning[J]. Chinese Journal of Network and Information Security, 2020, 6(5): 36-53.

图/表 9

图1

图2

表1

对抗攻击方法总结　Table 1 Summary of adversarial attacks"

威胁模型	攻击算法	有无目标	扰动范数	攻击类型
	L-BFGS攻击^[10]	定向	$l_{0}$	梯度攻击
	FGSM攻击^[11]	非定向	$l_{\infty}$	梯度攻击
	BIM攻击^[18]	非定向	$l_{\infty}$	梯度攻击
白盒	JSMA攻击^[19]	定向	$l_{0}$	梯度攻击
	C＆W攻击^[20]	定向	$l_{0} 、 l_{2} 、 l_{\infty}$	梯度攻击
	通用对抗扰动攻击^[12]	非定向	$l_{2} 、 l_{\infty}$	迁移攻击
	Deepfool攻击^[21]	非定向	$l_{2} 、 l_{\infty}$	梯度攻击
	单像素攻击^[13]	非定向	$l_{0}$	置信度攻击
	EOT攻击^[22]	定向	$l_{2}$	迁移攻击
黑盒	边界攻击^[26]	定向、非定向	$l_{2} 、 l_{\infty}$	决策攻击
	有偏边界攻击^[27]	定向、非定向	$l_{2} 、 l_{\infty}$	决策攻击
	零阶优化攻击^[24]	定向、非定向	$l_{2}$	置信度攻击

表1

图3

图4

图5

图6

图7

表2

参考文献 33

[34]	LI X , JI S , HAN M ,et al. Adversarial examples versus cloud-based detectors:a black-box empirical study[J]. IEEE Transactions on Dependable and Secure Computing, 2019
[35]	WEI X , LIANG S , CHEN N ,et al. Transferable adversarial attacks for image and video object detection[J]. arXiv preprint arXiv:1811.12641, 2018
[36]	GOODFELLOW I , POUGET-ABADIE J , MIRZA M ,et al. Generative adversarial nets[C]// Advances in Neural Information Processing Systems. 2014: 2672-2680.
[37]	REN S , HE K , GIRSHICK R ,et al. Faster r-CNN:Towards real-time object detection with region proposal networks[C]// Advances in Neural Information Processing Systems. 2015: 91-99.
[38]	LIU W , ANGUELOV D , ERHAN D ,et al. SSD:single shot multibox detector[C]// European Conference on Computer Vision. 2016: 21-37.
[39]	THYS S , VAN RANST W , GOEDEMé T . Fooling automated surveillance cameras:adversarial patches to attack person detection[C]// Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition Workshops. 2019.
[40]	REDMON J , FARHADI A . YOLO9000:better,faster,stronger[C]// Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 2017: 7263-7271.
[41]	EVTIMOV I , EYKHOLT K , FERNANDES E ,et al. Robust physical-world attacks on deep learning models[J]. arXiv preprint arXiv:1707.08945, 2017
[42]	LU J , SIBAI H , FABRY E ,et al. Standard detectors aren’t (currently) fooled by physical adversarial stop signs[J]. arXiv preprint arXiv:1710.03337, 2017
[43]	EYKHOLT K , EVTIMOV I , FERNANDES E ,et al. Note on attacking object detectors with adversarial stickers[J]. arXiv preprint arXiv:1712.08062, 2017
[44]	CHEN S T , CORNELIUS C , MARTIN J ,et al. Shapeshifter:robust physical adversarial attack on faster r-CNN object detector[C]// Joint European Conference on Machine Learning and Knowledge Discovery in Databases. 2018: 52-68.
[45]	BROWN T B,MANé D , ROY A ,et al. Adversarial Patch[J]. CoRR,abs/1712.09665:2017.
[46]	HENDRIK METZEN J , CHAITHANYA KUMAR M , BROX T ,et al. Universal adversarial perturbations against semantic image segmentation[C]// Proceedings of the IEEE International Conference on Computer Vision. 2017: 2755-2764.
[47]	XIE C , WANG J , ZHANG Z ,et al. Adversarial examples for semantic segmentation and object detection[C]// Proceedings of the IEEE International Conference on Computer Vision. 2017: 1369-1378.
[1]	KRIZHEVSKY A , SUTSKEVER I , HINTON G E . Imagenet classification with deep convolutional neural networks[C]// Advances in Neural Information Processing Systems. 2012: 1097-1105.
[2]	DAHL G E , YU D , DENG L ,et al. Context-dependent pre-trained deep neural networks for large-vocabulary speech recognition[J]. IEEE Transactions on Audio,Speech,and Language Processing, 2011,20: 30-42.
[48]	CISSE M , ADI Y , NEVEROVA N ,et al. Houdini:Fooling deep structured prediction models[J]. arXiv preprint arXiv:1707.05373, 2017
[49]	SHARIF M , BHAGAVATULA S , BAUER L ,et al. Accessorize to a crime:Real and stealthy attacks on state-of-the-art face recognition[C]// Proceedings of the 2016 ACM Sigsac Conference on Computer and Communications Security. 2016: 1528-1540.
[3]	HINTON G , DENG L , YU D ,et al. Deep neural networks for acoustic modeling in speech recognition:The shared views of four research groups[J]. IEEE Signal Processing Magazine, 2012,29: 82-97.
[4]	COLLOBERT R , WESTON J . A unified architecture for natural language processing:deep neural networks with multitask learning[C]// Proceedings of the 25th International Conference on Machine Learning. 2008: 160-167.
[50]	ZHOU Z , TANG D , WANG X ,et al. Invisible mask:Practical attacks on face recognition with infrared[J]. arXiv preprint arXiv:1803.04683, 2018
[51]	WANG Q , GUO W , ZHANG K ,et al. Adversary resistant deep neural networks with an application to malware detection[C]// Proceedings of the 23rd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. 2017: 1145-1153.
[52]	SRIVASTAVA N , HINTON G , KRIZHEVSKY A ,et al. Dropout:a simple way to prevent neural networks from overfitting[J]. The Journal of Machine Learning Research, 2014,15: 1929-1958.
[53]	PRAKASH A , MORAN N , GARBER S ,et al. Deflecting adversarial attacks with pixel deflection[C]// Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 2018: 8571-8580.
[5]	DAHL G E , STOKES J W , DENG L ,et al. Large-scale malware classification using random projections and neural networks[C]// 2013 IEEE International Conference on Acoustics,Speech and Signal Processing. 2013: 3422-3426.
[6]	SILVER D , HUANG A , MADDISON C J ,et al. Mastering the game of go with deep neural networks and tree search[J]. Nature, 2016，529:484.
[54]	HO J , KANG D-K . Pixel Redrawn For A Robust Adversarial Defense[J].
[55]	DZIUGAITE G K , GHAHRAMANI Z , ROY D M . A study of the effect of jpg compression on adversarial images[J]. arXiv preprint arXiv:1608.00853, 2016
[7]	VINYALS O , BABUSCHKIN I , CZARNECKI W M ,et al. Grandmaster level in StarCraft II using multi-agent reinforcement learning[J]. Nature, 2019,575: 350-354.
[8]	DENG J , DONG W , SOCHER R ,et al. Imagenet:a large-scale hierarchical image database[C]// 2009 IEEE Conference on Computer Vision and Pattern Recognition. 2009: 248-255.
[56]	DAS N , SHANBHOGUE M , CHEN S T ,et al. Keeping the bad guys out:Protecting and vaccinating deep learning with jpeg compression[J]. arXiv preprint arXiv:1705.02900, 2017
[57]	GUO C , RANA M , CISSE M ,et al. Countering adversarial images using input transformations[J]. arXiv preprint arXiv:1711.00117, 2017
[9]	LECUN Y , BOSER B , DENKER J S ,et al. Backpropagation applied to handwritten zip code recognition[J]. Neural Computation, 1989,1: 541-551.
[10]	SZEGEDY C , ZAREMBA W , SUTSKEVER I ,et al. Intriguing properties of neural networks[J]. arXiv preprint arXiv:1312.6199, 2013
[58]	RUDIN L I , OSHER S , FATEMI E . Nonlinear total variation based noise removal algorithms[J]. Physica D:Nonlinear Phenomena, 1992,60: 259-268.
[59]	EFROS A A , FREEMAN W T . Image quilting for texture synthesis and transfer[C]// Proceedings of the 28th Annual Conference on Computer Graphics and Interactive Techniques. 2001: 341-346.
[60]	XU W , EVANS D , QI Y . Feature squeezing:Detecting adversarial examples in deep neural networks[J]. arXiv preprint arXiv:1704.01155, 2017
[61]	BUADES A , COLL B , MOREL J M . A non-local algorithm for image denoising[C]// 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'05). 2005: 60-65.
[62]	RAFF E , SYLVESTER J , FORSYTH S ,et al. Barrage of random transforms for adversarially robust defense[C]// Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 2019: 6528-6537.
[63]	ANTONINI M , BARLAUD M , MATHIEU P ,et al. Image coding using wavelet transform[J]. IEEE Transactions on Image Processing, 1992,1: 205-220.
[64]	HUANG T , YANG G , TANG G . A fast two-dimensional median filtering algorithm[J]. IEEE Transactions on Acoustics,Speech,and Signal Processing, 1979,27: 13-18.
[65]	HE W , WEI J , CHEN X ,et al. Adversarial example defense:Ensembles of weak defenses are not strong[C]// 11th $$USENIX$$ Workshop on Offensive Technologies ($$WOOT$$ 17), 2017.
[11]	GOODFELLOW I J , SHLENS J , SZEGEDY C . Explaining and harnessing adversarial examples[J]. arXiv preprint arXiv:1412.6572, 2014
[12]	MOOSAVI-DEZFOOLI S-M , FAWZI A , FAWZI O ,et al. Universal adversarial perturbations[C]// Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 2017: 1765-1773.
[66]	AKHTAR N , LIU J , MIAN A . Defense against universal adversarial perturbations[C]// Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2018: 3389-3398.
[67]	CORTES C , VAPNIK V . Support-vector networks[J]. Machine Learning, 1995,20: 273-297.
[13]	SU J , VARGAS D V , SAKURAI K . One pixel attack for fooling deep neural networks[J]. IEEE Transactions on Evolutionary Computation, 2019,23: 828-841.
[14]	CARLINI N , WAGNER D . Adversarial examples are not easily detected:Bypassing ten detection methods[C]// Proceedings of the 10th ACM Workshop on Artificial Intelligence and Security. 2017: 3-14.
[68]	VINCENT P , LAROCHELLE H , BENGIO Y ,et al. Extracting and composing robust features with denoising autoencoders[C]// Proceedings of the 25th International Conference on Machine Learning. 2008: 1096-1103.
[69]	LIAO F , LIANG M , DONG Y ,et al. Defense against adversarial attacks using high-level representation guided denoiser[C]// Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 2018: 1778-1787.
[15]	ATHALYE A , CARLINI N . On the robustness of the CVPR 2018 white-box adversarial example defenses[J]. arXiv preprint arXiv:1804.03286, 2018
[16]	ATHALYE A , CARLINI N , WAGNER D . Obfuscated gradients give a false sense of security:Circumventing defenses to adversarial examples[J]. arXiv preprint arXiv:1802.00420, 2018
[70]	RONNEBERGER O , FISCHER P , BROX T . U-net:convolutional networks for biomedical image segmentation[C]// International Conference on Medical Image Computing and Computer-assisted Intervention. 2015: 234-241.
[71]	SAMANGOUEI P , KABKAB M , CHELLAPPA R . Defense-gan:protecting classifiers against adversarial attacks using generative models[J]. arXiv preprint arXiv:1805.06605, 2018
[17]	FLETCHER R . Practical methods of optimization[M]. John Wiley＆ Sons, 2013.
[18]	KURAKIN A , GOODFELLOW I , BENGIO S . Adversarial machine learning at scale[J]. arXiv preprint arXiv:1611.01236, 2016
[72]	KRIZHEVSKY A , NAIR V , HINTON G . CIFAR-10 (Canadian Institute for Advanced Research)[D]. Toronto:University of Toronto, 2009.
[73]	YANN L , BERNHARD B , JOHN S D ,et al. Backpropagation applied to handwritten zip code recognition[J]. Neural Computation, 1989,1(4): 541-551.
[19]	PAPERNOT N , MCDANIEL P , JHA S ,et al. The limitations of deep learning in adversarial settings[C]// 2016 IEEE European Symposium on Security and Privacy (EuroS＆P). 2016: 372-387.
[20]	CARLINI N , WAGNER D . Towards evaluating the robustness of neural networks[C]// 2017 IEEE Symposium on Security and Privacy (SP). 2017: 39-57.
[74]	BAO R , LIANG S , WANG Q . Featurized Bidirectional GAN:adversarial defense via adversarially learned semantic inference[J]. arXiv preprint arXiv:1805.07862, 2018
[75]	DUMOULIN V , BELGHAZI I , POOLE B ,et al. Adversarially learned inference[J]. arXiv preprint arXiv:1606.00704, 2016
[76]	DONAHUE J , KR?HENBüHL P , DARRELL T . Adversarial feature learning[J]. arXiv preprint arXiv:1605.09782, 2016
[21]	MOOSAVI-DEZFOOLI S-M , FAWZI A , FROSSARD P . Deepfool:a simple and accurate method to fool deep neural networks[C]// Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 2016: 2574-2582.
[22]	ATHALYE A , ENGSTROM L , ILYAS A ,et al. Synthesizing robust adversarial examples[J]. arXiv preprint arXiv:1707.07397, 2017
[77]	MUSTAFA A , KHAN S H , HAYAT M ,et al. Image super-resolution as a defense against adversarial attacks[J]. IEEE Transactions on Image Processing, 2019,29: 1711-1724.
[78]	LIM B , SON S , KIM H ,et al. Enhanced deep residual networks for single image super-resolution[C]// Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition Workshops. 2017: 136-144.
[23]	MCLAREN K . XIII—The development of the CIE 1976 (L* a* b*) uniform colour space and colour-difference formula[J]. Journal of the Society of Dyers and Colourists, 1976,92: 338-341.
[24]	CHEN P-Y , ZHANG H , SHARMA Y ,et al. Zoo:zeroth order optimization based black-box attacks to deep neural networks without training substitute models[C]// Proceedings of the 10th ACM Workshop on Artificial Intelligence and Security. 2017: 15-26.
[79]	CHANG S G , YU B , VETTERLI M . Adaptive wavelet thresholding for image denoising and compression[J]. IEEE Transactions on Image Processing, 2000,9: 1532-1546.
[80]	IOFFE S , SZEGEDY C . Batch normalization:Accelerating deep network training by reducing internal covariate shift[J]. arXiv preprint arXiv:1502.03167, 2015
[81]	MADRY A , MAKELOV A , SCHMIDT L ,et al. Towards deep learning models resistant to adversarial attacks[J]. arXiv preprint arXiv:1706.06083, 2017
[82]	CHANG T J , HE Y , LI P . Efficient Two-Step Adversarial Defense for Deep Neural Networks[J]. arXiv preprint arXiv:1810.03739, 2018
[83]	HUANG R , XU B , SCHUURMANS D ,et al. Learning with a strong adversary[J]. CoRR,abs/1511.03034:2015.
[84]	SHAHAM U , YAMADA Y , NEGAHBAN S . Understanding adversarial training:Increasing local stability of supervised models through robust optimization[J]. Neurocomputing, 2018,307: 195-204,.
[85]	KANNAN H , KURAKIN A , GOODFELLOW I . Adversarial logit pairing[J]. arXiv preprint arXiv:1803.06373, 2018
[86]	TRAMèR F , KURAKIN A , PAPERNOT N ,et al. Ensemble adversarial training:Attacks and defenses[J]. arXiv preprint arXiv:1705.07204, 2017
[25]	KINGMA D P , BA J . Adam:A method for stochastic optimization[J]. arXiv preprint arXiv:1412.6980, 2014
[26]	BRENDEL W , RAUBER J , BETHGE M . Decision-based adversarial attacks:reliable attacks against black-box machine learning models[J]. arXiv preprint arXiv:1712.04248, 2017
[87]	LI P , YI J , ZHOU B ,et al. Improving the Robustness of Deep Neural Networks via Adversarial Training with Triplet Loss[J]. arXiv preprint arXiv:1905.11713, 2019
[88]	SONG C , HE K , WANG L ,et al. Improving the generalization of adversarial training with domain adaptation[J]. arXiv preprint arXiv:1810.00740, 2018
[27]	BRUNNER T , DIEHL F , LE M T ,et al. Guessing smart:biased sampling for efficient black-box adversarial attacks[C]// Proceedings of the IEEE International Conference on Computer Vision. 2019: 4958-4966.
[28]	PERLIN K . An image synthesizer[J]. ACM Siggraph Computer Graphics, 1985,19: 287-296.
[89]	ROZSA A , GUNTHER M , Boult T E . Towards robust deep neural networks with BANG[J]. arXiv preprint arXiv:1612.00138, 2016
[90]	TOMAR V S , ROSE R C . Manifold regularized deep neural networks[C]// Fifteenth Annual Conference of the International Speech Communication Association. 2014.
[29]	CHEN J , JORDAN M I , WAINWRIGHT M J . HopSkipJumpAttack:a query-efficient decision-based attack[J]. arXiv preprint arXiv:1904.02144,3:2019,
[30]	KURAKIN A , GOODFELLOW I , BENGIO S . Adversarial examples in the physical world[J]. arXiv preprint arXiv:1607.02533, 2016
[91]	NETZER Y , WANG T , COATES A ,et al. Reading digits in natural images with unsupervised feature learning[R]. 2011.
[92]	SANKARANARAYANAN S , JAIN A , CHELLAPPA R ,et al. Regularizing deep networks using efficient layerwise adversarial training[C]// Thirty-Second AAAI Conference on Artificial Intelligence. 2018:
[31]	SZEGEDY C , VANHOUCKE V , IOFFE S ,et al. Rethinking the inception architecture for computer vision[C]// Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 2016: 2818-2826.
[32]	PAPERNOT N , MCDANIEL P , GOODFELLOW I ,et al. Practical black-box attacks against machine learning[C]// Proceedings of the 2017 ACM on Asia Conference on Computer and Communications Security. 2017: 506-519.
[93]	LIU C , JAJA J . Feature prioritization and regularization improve standard accuracy and adversarial robustness[J]. arXiv preprint arXiv:1810.02424, 2018
[94]	PAPERNOT N , MCDANIEL P , WU X ,et al. Distillation as a defense to adversarial perturbations against deep neural networks[C]// 2016 IEEE Symposium on Security and Privacy (SP). 2016: 582-597.
[95]	HINTON G , VINYALS O , DEAN J . Distilling the knowledge in a neural network[J]. arXiv preprint arXiv:1503.02531, 2015
[96]	CARLINI N , WAGNER D . Defensive distillation is not robust to adversarial examples[J]. arXiv preprint arXiv:1607.04311, 2016
[97]	XIE C , WU Y , MAATEN L V D ,et al. Feature denoising for improving adversarial robustness[C]// Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 2019: 501-509.
[98]	GAO J , WANG B , LIN Z ,et al. Masking deep neural network models for robustness against adversarial samples[J]. arXiv preprint arXiv:1702.06763,
[99]	SUN B , TSAI N-H , LIU F ,et al. Adversarial Defense by Stratified Convolutional Sparse Coding[C]// Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 2019： 11447-11456.
[100]	CHOUDHURY B , SWANSON R , HEIDE F ,et al. Consensus convolutional sparse coding[C]// Proceedings of the IEEE International Conference on Computer Vision. 2017: 4280-4288.
[101]	GU S , RIGAZIO L . Towards deep neural network architectures robust to adversarial examples[J]. arXiv preprint arXiv:1412.5068, 2014
[102]	RIFAI S , VINCENT P , MULLER X ,et al. Contractive auto-encoders:explicit invariance during feature extraction[C]// Proceedings of the 28th International Conference on International Conference on Machine Learning. 2011: 833-840.
[103]	HOSSEINI H , CHEN Y , KANNAN S ,et al. Blocking transferability of adversarial examples in black-box learning systems[J]. arXiv preprint arXiv:1703.04318, 2017
[104]	TIAN S , YANG G , CAI Y . Detecting adversarial examples through image transformation[C]// Thirty-Second AAAI Conference on Artificial Intelligence, 2018.
[105]	CHEN S , CARLINI N , WAGNER D . Stateful detection of black-box adversarial attacks[J]. arXiv preprint arXiv:1907.05587, 2019
[106]	ILYAS A , ENGSTROM L , ATHALYE A ,et al. Black-box adversarial attacks with limited queries and information[J]. arXiv preprint arXiv:1804.08598, 2018
[107]	METZEN J H , GENEWEIN T , FISCHER V ,et al. On detecting adversarial perturbations[J]. arXiv preprint arXiv:1702.04267, 2017
[108]	LI X , LI F . Adversarial examples detection in deep networks with convolutional filter statistics[C]// Proceedings of the IEEE International Conference on Computer Vision. 2017: 5764-5772.
[109]	NGUYEN A , YOSINSKI J , CLUNE J . Deep neural networks are easily fooled:High confidence predictions for unrecognizable images[C]// Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 2015: 427-436.
[110]	FEINMAN R , CURTIN R R , SHINTRE S ,et al. Detecting adversarial samples from artifacts[J]. arXiv preprint arXiv:1703.00410, 2017
[111]	ZHENG Z , HONG P . Robust detection of adversarial attacks by modeling the intrinsic properties of deep neural networks[C]// Advances in Neural Information Processing Systems, 2018: 7913-7922.
[112]	MENG D , CHEN H . Magnet:a two-pronged defense against adversarial examples[C]// Proceedings of the 2017 ACM SIGSAC Conference on Computer and Communications Security. 2017: 135-147.
[113]	CARLINI N , WAGNER D . Magnet and" efficient defenses against adversarial attacks" are not robust to adversarial examples[J]. arXiv preprint arXiv:1711.08478, 2017
[114]	HUANG B , WANG Y , WANG W . Model-agnostic adversarial detection by random perturbations[C]// Proceedings of the 28th International Joint Conference on Artificial Intelligence, 2019: 4689-4696.
[115]	LIANG B , LI H , SU M ,et al. Detecting adversarial image examples in deep networks with adaptive noise reduction[J]. arXiv preprint arXiv:1705.08378, 2017
[116]	CHENG M , LE T , CHEN P Y ,et al. Query-efficient hard-label black-box attack:An optimization-based approach[J]. arXiv preprint arXiv:1807.04457, 2018
[33]	LIU Y , CHEN X , LIU C ,et al. Delving into transferable adversarial examples and black-box attacks[J]. arXiv preprint arXiv:1611.02770, 2016

类别	方法	优点	缺点
	文献[54]	可以预防未知攻击，防止模型过拟合	对扰动大小敏感
	文献[62]	不依赖混淆梯度^[16]，无须重新训练模型，可以防御白盒攻击	需大量的简单防御作为基础
预处理	文献[69]	具有较强的泛化能力	不能防御白盒攻击，去噪能力依赖于模型的表达能力
	文献[71]	无须重新训练模型和改变模型架构	对抗生成网络的重构能力受限，在数据集CIFAR-10^[72]和ImageNet^[8]上无效
	文献[77]	简单，容易补充到已有的防御系统中，可以防御未知攻击	依赖超分辨率网络的表达能力
	文献[85]	能够在大数据集 ImageNet^[8]上进行对抗训练	需要重新进行模型架构修改，并进行对抗训练，计算开销较大
	文献[89]	简单，不依赖对抗样本和数据增强	不同数据集不能够自适应，需要调整参数
提高模型鲁棒性	文献[93]	引入注意力机提促使模型学习鲁棒性特征	需要重新进行模型架构修改，并进行对抗训练，计算开销较大
	文献[97]	特征去噪模块化，添加去噪模块简单	修改现有模型，并进行对抗训练，开销较大
	文献[99]	防御未知攻击、不受数据集规模影响、无须修改原有模型架构	需要对原有数据集进行映射，并进行对抗训练
	文献[104]	操作简单、开销较低、模型无关	图像旋转角度不能够自适应
恶意检测	文献[105]	阻止对手生成对抗样本	不能够防御迁移攻击，需要存储用户查询记录
	文献[111]	简单、可集成到现有防御系统中	依赖模型的内部信息
	文献[115]	根据扰动大小自适应去噪、模型无关	不适用只改变小部分像素值的攻击算法

深度学习中的对抗攻击与防御

Adversarial attacks and defenses in deep learning

在线阅读

pdf下载

可视化

摘要/Abstract

引用本文

使用本文

图/表 9

参考文献 33

相关文章 13

Metrics

推荐阅读 0

[1]	陈先意, 顾军, 颜凯, 江栋, 许林峰, 付章杰. 针对车牌识别系统的双重对抗攻击[J]. 网络与信息安全学报, 2023, 9(3): 16-27.
[2]	陈晋音, 李荣昌, 黄国瀚, 刘涛, 郑海斌, 程瑶. 纵向联邦学习方法及其隐私和安全综述[J]. 网络与信息安全学报, 2023, 9(2): 1-20.
[3]	王滨, 李思敏, 钱亚冠, 张君, 李超豪, 朱晨鸣, 张鸿飞. 基于剪枝技术和鲁棒蒸馏融合的轻量对抗攻击防御方法[J]. 网络与信息安全学报, 2022, 8(6): 102-109.
[4]	林点, 潘理, 易平. 面向图像识别的卷积神经网络鲁棒性研究进展[J]. 网络与信息安全学报, 2022, 8(3): 111-122.
[5]	陈晋音, 吴长安, 郑海斌. 基于softmax激活变换的对抗防御方法[J]. 网络与信息安全学报, 2022, 8(2): 48-63.
[6]	邱宝琳, 易平. 基于多维特征图知识蒸馏的对抗样本防御方法[J]. 网络与信息安全学报, 2022, 8(2): 88-99.
[7]	秦中元, 贺兆祥, 李涛, 陈立全. 基于图像重构的MNIST对抗样本防御算法[J]. 网络与信息安全学报, 2022, 8(1): 86-94.
[8]	张宇, 李海良. 基于RSA的图像可识别对抗攻击方法[J]. 网络与信息安全学报, 2021, 7(5): 40-48.
[9]	陈晋音, 张敦杰, 黄国瀚, 林翔, 鲍亮. 面向图神经网络的对抗攻击与防御综述[J]. 网络与信息安全学报, 2021, 7(3): 1-28.
[10]	李艳, 刘威, 孙远路. 基于对抗学习的强PUF安全结构研究[J]. 网络与信息安全学报, 2021, 7(3): 115-122.
[11]	王滨, 陈靓, 钱亚冠, 郭艳凯, 邵琦琦, 王佳敏. 面向对抗样本攻击的移动目标防御[J]. 网络与信息安全学报, 2021, 7(1): 113-120.
[12]	段广晗,马春光,宋蕾,武朋. 深度学习中对抗样本的构造及防御研究[J]. 网络与信息安全学报, 2020, 6(2): 1-11.
[13]	严飞,张铭伦,张立强. 基于边界值不变量的对抗样本检测方法[J]. 网络与信息安全学报, 2020, 6(1): 38-45.