• •
吴建汉; 司世景; 王健宗; 肖京
作者简介:
吴建汉(1998- ),男,中国科学技术大学硕士研究生,中国计算机学
会学生会员,平安科技(深圳)有限公司算法工程师。主要研究方向为计算机视觉和联
邦学习。
司世景(1988‒ ),男,博士,平安科技(深圳)有限公司资深算
法研究员,深圳市海外高层次人才。美国杜克大学人工智能博士后,中国计算机学会会
员,主要研究方向为机器学习和及其在人工智能领域应用。
王健宗(1983‒ ),男,博士,平安科技(深圳)有限公司副总工程
师,资深人工智能总监,联邦学习技术部总经理。美国佛罗里达大学人工智能博士后,
中国计算机学会高级会员,中国计算机学会大数据专家委员会委员,曾任美国莱斯大学电子与计算机工程系研究员,主要研究方向为联邦学习和人工智能等。
肖京(1972‒ ),男,博士,中国平安集团首席科学家,2019年吴文俊人工智能杰出贡献奖获得者,中国计算机学会深圳分部副主席。主要研究方向为计算机图形学学科、自动驾驶、3D显示、医疗诊断、联邦学习等。
Wu jianhan, Si shijing, Wang jianzong, Xiao jing
摘要: 随着机器学习技术的广泛应用,数据安全问题时有发生。人们对于数据隐私保护的需求也日渐显现,这无疑降低了不同实体间共享数据的可能性,使得数据难以共享使用,造成了数据孤岛问题。联邦学习(federated learning,FL)可以有效地解决数据孤岛问题。它本质上是一种分布式的机器学习,其最大的特点是将用户数据保存在用户本地,使模型联合训练过程中不会泄露各参与方的原始数据。尽管如此,联邦学习在现实中仍然存在许多安全隐患,需要深入研究。本文对联邦学习可能受到的攻击手段和相应的防御措施进行了全面的调查,并进行了系统地梳理。首先我们根据联邦学习的训练环节对其可能受到的攻击和威胁进行了分类,列举各个类别的攻击方法,并介绍了相应攻击的攻击原理。而后针对这些攻击和威胁总结了具体的防御措施,并进行了原理分析,以期为初次接触这一领域的研究人员提供详实的参考。最后,我们对该研究领域的未来工作进行了展望,指出几个需要重点关注的方向,帮助提高联邦学习的安全性。
吴建汉, 司世景, 王健宗, 肖京. 联邦学习攻击与防御综述[J]. 大数据, doi: 10.11959/j.issn.2096-0271.2022038.
Wu jianhan, Si shijing, Wang jianzong, Xiao jing. Threats and defenses of Federated Learning: A survey[J]. Big Data Research, doi: 10.11959/j.issn.2096-0271.2022038.
[1] ZHANG C, XIE Y, BAI H, et al. 2021. A survey on federated learning. Knowledge-based systems, 216, p.106775. [2] ALEDHARI M, RAZZAK R, PARIZI R M, et al. Federated learning: A survey on enabling technologies, protocols, and applications[J]. IEEE Access, 2020, 8: 140699-140725. [3] BLANCO-JUSTICIA A, DOMINGO-FERRER J, MARTÍNEZ S, et al. Achieving security and privacy in federated learning systems: Survey, research challenges and future directions[J]. Engineering Applications of Artificial Intelligence, 2021, 106: 104468. [4] YANG Q, LIU Y, CHENG Y, et al. 2019. Federated learning. Synthesis Lectures On Artificial Intelligence And Machine Learning, 13(3), pp.1-207. [5] LI T, SAHU A.K, TALWALKAR A, et al. 2020. Federated learning: challenges, methods, and future directions. IEEE Signal Processing Magazine, 37(3), pp.50-60.TRUONG, [6] NGUYEN, et al. "Privacy preservation in federated learning: an insightful survey from the GDPR perspective." Computers & Security 110 (2021): 102402. [7] KONEČNÝ J,MCMAHAN H B,YU F X,et al. Federated learning:strategies for improving communication efficiency[J]. arXiv preprint arXiv :1610.05492,2016. [8] ABDULRAHMAN S, TOUT H, OULD-SLIMANE H, et al. A survey on federated learning: the journey from centralized to distributed on-site learning and beyond[J]. IEEE Internet Of Things Journal, 2021, 8(7): 5476-5497. [9] 王健宗, 孔令炜, 黄章成, 等. 联邦学习算法综述[J]. 大数据, 2020, 6(6): 0. WANG J Z, KONG L W, HUANG Z Z, et al. Research review of federated learning algorithms[J]. Big Data Research, 2020, 6(6): 0. [10] LI L, FAN Y, TSE M, et al. 2020. A review of applications in federated learning. Computers & Industrial Engineering, p.106854. [11] LIU Y, KANG Y, XING C, et al. 2020. A secure federated transfer learning framework. IEEE Intelligent Systems, 35(4), pp.70-82. [12] KAIROUZ P,MCMAHAN H B,AVENT B,et al. Advances and open problems in federated learning[J].arXiv preprint arxiv:1912.04977,2019. [13] S. ZHAO, X. MA, X. ZHENG, et al. “Clean-label backdoor attacks on video recognition models,” in CVPR, 2020, pp. 14 443–14 452. [14] BHAGOJI A N,CHAKRABORTY S,MITTAL P,et al. Analyzing federated learning through an adversarial lens[C]//International Conference On Machine Learning. PMLR,2019:634-643. [15] SHAFAHI A,HUANG W R,NAJIBI M,et al. Poison frogs! Targeted clean-label poisoning attacks on neural networks[C]//Advances In Neural Information Processing Systems. 2018:6103-6113. [16] BATTISTA BIGGIO, BLAINE NELSON, AND PAVEL LASKOV. Poisoning attacks against support vector machines. CoRR, arXiv:1206.6389, 2012. [17] X CHEN, C LIU, B LI, et al. Targeted backdoor attacks on deep learning systems using data poisoning. arXiv preprint arXiv:1712.05526, 2017. [18] Tolpegin V, Truex S, Gursoy M E, et al. Data poisoning attacks against federated learning systems[C]//European Symposium on Research in Computer Security. Springer, Cham, 2020: 480-501. [19] BAGDASARYAN E, VEIT A, HUA Y, et al. How to backdoor federated learning[C] //International Conference On Artificial Intelligence And Statistics. PMLR, 2020: 2938-2948. [20] JERE M S, FARNAN T, KOUSHANFAR F. A taxonomy of attacks on federated learning[J]. IEEE Security & Privacy, 2020, 19(2): 20-28. [21] ZHOU X, XU M, WU Y, et al. Deep Model Poisoning Attack on Federated Learning[J]. Future Internet, 2021, 13(3): 73. [22] M FANG, X CAO, J JIA, et al. “Local model poisoning attacks to byzantine-robust federated learning,” USENIX Security 2020, august 12-14, 2020 [23] J. BERNSTEIN, J. ZHAO, K. AZIZZADENESHELI, et al. “signSGD with majority vote is communication efficient and byzantine fault tolerant,” ICLR, 2019. [24] C. XIE, O. KOYEJO, AND I. GUPTA, “Fall of empires: Breaking byzantine-tolerant sgd by inner product manipulation,” in UAI. PMLR, 2020, pp. 261–270. [25] Shejwalkar V, Houmansadr A. Manipulating the Byzantine: Optimizing Model Poisoning Attacks and Defenses for Federated Learning[J]. Internet Society, 2021: 18. [26] LIU Y, MA X, BAILEY J, et al. Reflection backdoor: A natural backdoor attack on deep neural networks[C]//European Conference on Computer Vision. Springer, Cham, 2020: 182-199. [27] COSTA G, PINELLI F, SODERI S, et al. Covert channel attack to federated learning systems[J]. Arxiv preprint arXiv:2104.10561, 2021. [28] LEE H, KIM J, HUSSAIN R, et al. On Defensive Neural Networks Against Inference Attack in Federated Learning[C]//ICC 2021: 1-6. [29] AONO Y,HAYASHI T,TRIEU PHONG L,et al. Scalable and secure logistic regression via homomorphic encryption[C]//Proceedings Of The Sixth ACM Conference On Data And Application Security And Privacy. 2016:142-144. [30] XINJIAN L, YUNCHENG W, XIAOKUI X, et al. Feature inference attack on model predictions in vertical federated learning[C]//2021 ICDE. IEEE, 2021: 181-192. [31] WAINAKH A, VENTOLA F, MÜßIG T, et al. User label leakage from gradients in federated learning[J]. arXiv preprint arXiv:2105.09369, 2021. [32] NASR M, SHOKRI R, HOUMANSADR A. Comprehensive privacy analysis of deep learning: Passive and active white-box inference attacks against centralized and federated learning[C]//2019 IEEE symposium on security and privacy (SP). IEEE, 2019: 739-753. [33] DONG Y,SU H,WU B,et al. Efficient decision-based black-box adversarial attacks on face recognition[C]//CVPR. 2019:7714-7722. [34] YIN Z, YUAN Y, GUO P, et al. Backdoor attacks on federated learning with lottery ticket hypothesis[J]. arXiv preprint arXiv:2109.10512, 2021. [35] CHENG M, LE T, CHEN PY, et al. 2018. Query-efficient hard-label black-box attack: an optimization-based approach. arXiv preprint arXiv:1807.04457. [36] Li Y, Li L, Wang L, et al. Nattack: Learning the distributions of adversarial examples for an improved black-box attack on deep neural networks[C]// PMLR, 2019: 3866-3876. [37] Y BAI, D CHEN, T CHEN, et al. "GANMIA: GAN-based Black-box Membership Inference Attack," ICC 2021 - IEEE International Conference on Communications, 2021, pp. 1-6 [38] YUHENG ZHANG, RUOXI JIA, HENGZHI PEI, et al. The secret revealer: generative model inversion attacks against deep neural networks. in CVPR, pages 253–261, 2020. 2 [39] REN H, DENG J, XIE X. GRNN: generative regression neural network--a data leakage attack for federated learning[J]. arXiv preprint arXiv:2105.00529, 2021. [40] HITAJ B,ATENIESE G,PEREZ-CRUZ F. Deep models under the GAN:information leakage from collaborative deep learning[C]//Proceedings Of The 2017 ACM SIGSAC Conference On Computer And Communications Security. 2017:603-618. [41] LYU L, YU H AND, YANG Q, 2020. Threats to federated learning: A survey. arXiv preprint arXiv:2003.02133. [42] M. SONG, Z. WANG, Z. ZHANG, et al. “Analyzing user-level privacy attack against federated learning,’’ IEEE [J]. Sel. Areas Commun., vol. 38, no. 10, pp. 2430–2444, 2020 [43] BOUACIDA N, MOHAPATRA P. Vulnerabilities in Federated Learning[J]. IEEE Access, 2021, 9: 63229-63249. [44] Z WANG, M SONG, Z ZHANG, et al. “Beyond inferring class representatives: User-level privacy leakage from federated learning,” in Proc. IEEE INFOCOM, 2019, pp. 2512–2520 [45] MOTHUKURI V,PARIZI R M,POURIYEH S,et al. A survey on security and privacy of federated learning[J]. Future Generation Computer Systems ,2020,115:619-640. [46] LYU L, YU H, MA X, et al. Privacy and robustness in federated learning: Attacks and defenses[J]. arXiv preprint arXiv:2012.06337, 2020. [47] WEI K, LI J, DING M, et al. Federated learning with differential privacy: Algorithms and performance analysis[J]. IEEE Transactions on Information Forensics and Security, 2020, 15: 3454-3469. [48] GIRGIS A, DATA D, DIGGAVI S, et al. Shuffled model of differential privacy in federated learning[C]//International Conference On Artificial Intelligence And Statistics. PMLR, 2021: 2521-2529. [49] HU R, GUO Y, LI H, et al. Personalized federated learning with differential privacy[J]. IEEE Internet of Things Journal, 2020, 7(10): 9530-9539. [50] MCMAHAN H B, RAMAGE D, TALWAR K, et al. Learning differentially private recurrent language models[C]//International Conference On Learning Representations. 2018. [51] GEYER R C, KLEIN T J, NABI M. Differentially private federated learning: a client level perspective[J]. 2018. [52] SUN L, QIAN J, CHEN X, et al. LDP-FL: Practical private aggregation in federated learning with local differential privacy[J]. arXiv preprint arXiv:2007.15789, 2020. [53] DUCHI J C, JORDAN M I, WAINWRIGHT M J. Local privacy and statistical minimax rates[C]//2013 IEEE 54th Annual Symposium On Foundations Of Computer Science. IEEE, 2013: 429-438. [54] ERLINGSSON Ú, PIHUR V, KOROLOVA A. RAPPOR: randomized aggregatable privacy-preserving ordinal response[C]//Proceedings Of The 2014 ACM SIGSAC Conference On Computer And Communications Security. 2014: 1054-1067. [55] RASTOGI V, NATH S. Differentially private aggregation of distributed time-series with transformation and encryption[C]//Proceedings Of The 2010 ACM Sigmod International Conference On Management Of Data. 2010: 735-746. [56] AGARWAL N, SURESH A T, YU F, et al. CPSGD: Communication-efficient and differentially-private distributed SGD[C]//Proceedings Of The 32nd International Conference On Neural Information Processing Systems. 2018: 7575-7586. [57] ZHANG C, LI S, XIA J, et al. Batchcrypt: Efficient homomorphic encryption for cross-silo federated learning[C]// Annual Technical Conference {USENIX}2020: 493-506. [58] FANG HAOKUN, QIAN QUAN. Privacy preserving machine learning with homomorphic encryption and federated learning[J]. Future Internet, 2021, 13(4): 94. [59] GENTRY C. Fully homomorphic encryption using ideal lattices[C]//Proceedings Of The Forty-First Annual ACM Symposium On Theory Of Computing. 2009: 169-178. [60] AONO Y, HAYASHI T, WANG L, et al. Privacy-preserving deep learning via additively homomorphic encryption. IEEE Transactions on Information Forensics and Security, (2017), 13(5):1333-1345. [61] YANG T, ANDREW G, EICHNER H, et al. Applied federated learning: Improving google keyboard query suggestions. (2018) arXiv preprint arXiv:1812.02903 [62] A MADI, O STAN, A MAYOUE, et al. "A secure federated learning framework using homomorphic encryption and verifiable computing," RDAAPS, 2021, pp. 1-8, [63] ZHU H, GOH R S M, NG W K. Privacy-preserving weighted federated learning within the secret sharing framework[J]. IEEE ACCESS, 2020, 8: 198275-198284. [64] CHA J, SINGH S.K, KIM T.W, et al.2021. Blockchain-empowered cloud architecture based on secret sharing for smart city. Journal Of Information Security And Applications, 57, p.102686. [65] BONAWITZ K,IVANOV V,KREUTER B,et al. Practical secure aggregation for privacy-preserving machine learning[C]//Proceedings Of The 2017 ACM SIGSAC Conference On Computer And Communications Security. 2017:1175-1191 [66] HAN GANG, ZHANG TIANTIAN, ZHENG YINGHUI, et al. Verifiable and privacy preserving federated learning without fully trusted centers[J]. Journal of ambient intelligence and humanized computing, 2021: 1-11. [67] ANIRUDH C, CHOUDHURY A, PATRA A. A survey on perfectly-secure verifiable secret-sharing[J]. IACR cryptol. Eprint arch., 2021, 2021: 445. [68] FEREIDOONI H, MARCHAL S, MIETTINEN M, et al. Safelearn: secure aggregation for private federated learning[C]//2021 IEEE SPW, 2021: 56-62. [69] 周俊, 方国英, 吴楠. (2020). 联邦学习安全与隐私保护研究综述. 西华大学学报 (自然 科学版 ), 39(4), 9-17.) ZHOU J,FANG G Y,WU N. Survey on security and privacy-preserving in federated learning[J]. Journal Of Xihua University(Natural Science Edition), 2020, 39(4): 9 − 17. [70] BARACALDO N,CHEN B,LUDWIG H,et al. Mitigating poisoning attacks on machine learning models:a data provenance based approach[C]//Proceedings Of The 10th ACM Workshop On Artificial Intelligence And Security. 2017:103-110. [71] SATTLER F, WIEDEMANN S, MÜLLER K R, et al. Robust and communication-efficient federated learning from non-iid data[J]. IEEE transactions on neural networks and learning systems, 2019, 31(9): 3400-3413. [72] LIAO F,LIANG M,DONG Y,et al. Defense against adversarial attacks using high-level representation guided denoiser[C]//CVPR. 2018:1778-1787. [73] XU W,EVANS D,QI Y. Feature Squeezing:detecting adversarial examples in deep neural networks[J]. arXiv preprint arXiv:1704. 01155,2017. [74] ZHU C,CHENG Y,GAN Z,et al. FreeLB:enhanced adversarial training for language understanding[J]. arXiv preprint arXiv:1909. 11764,2019. [75] SHAH D,DUBE P, CHAKRABORTY S, et al. 2021. Adversarial training in communication constrained federated learning. arXiv preprint arXiv:2103.01319 [76] FUNG C,YOON C J M,BESCHASTNIKH I. Mitigating sybils in federated learning poisoning[J]. arXiv preprint arXiv:1808. 04866,2018. [77] 王健宗, 孔令炜, 黄章成, 等. 联邦学习隐私保护研究进展[J]. 大数据, 7(3): 2021030. WANG J Z, KONG L W, HUANG Z Z, et al. Research advances on privacy protection of federated learning[J]. Big Data Research, 7(3): 2021030. [78] ANDREINA, SEBASTIEN, et al. "baffle: backdoor detection via feedback-based federated learning." ICDCS IEEE, 2021. [79] MCMAHAN B, MOORE E, RAMAGE D, et al. Communication-efficient learning of deep networks from decentralized data[C]//Artificial intelligence and statistics. PMLR, 2017: 1273-1282. [80] YIN D,CHEN Y,RAMCHANDRAN K,et al. Byzantine-robust distributed learning:towards optimal statistical rates[J]. arXiv preprint arXiv:1803. 01498,2018. [81] BLANCHARD P,GUERRAOUI R,STAINER J. Machine learning with adversaries:byzantine tolerant gradient descent[C]// NeurIPS 2017:119-129. [82] MHAMDI E M E,GUERRAOUI R,ROUAULT S. The hidden vulnerability of distributed learning in byzantium[J]. arXiv preprint arXiv:1802. 07927,2018. [83] SO J, GÜLER B, AVESTIMEHR A S. Turbo-aggregate: Breaking the quadratic aggregation barrier in secure federated learning[J]. IEEE Journal on Selected Areas in Information Theory, 2021, 2(1): 479-489. [84] LEE H, KIM J, AHN S, et al. J. (2021). Digestive neural networks: a novel defense strategy against inference attacks in federated learning. Computers & Security, 102378. [85] QUOC D L, FETZER C. SecFL: Confidential Federated Learning using TEEs[J]. arXiv preprint arXiv:2110.00981, 2021. [86] LI W, XIA Y, LU L, et al. TEEv: virtualizing trusted execution environments on mobile platforms[C]//Proceedings Of The 15th ACM SIGPLAN/SIGOPS International Conference On Virtual Execution Environments. 2019: 2-16. [87] CHEN Y,LUO F,LI T,et al. A training-integrity privacy-preserving federated learning scheme with trusted execution environment[J]. Information Sciences,2020,522:69-79. [88] Y. ZHAO, J. ZHAO, L. JIANG, et al. “Mobile edge computing, blockchain and reputation-based crowdsourcing IoT federated learning: A secure, decentralized and privacy-preserving system,” 2019. [Online]. Available: arXiv:1906.10893. [89] DOSHI-VELEZ F,KIM B. Towards a rigorous science of interpretable machine learning[J]. arXiv preprint arXiv:1702. 08608,2017. |
[1] | 任帅, 陈丹丹, 储根深, 白鹤, 李慧昭, 何远杰, 胡长军. 基于材料数值计算大数据的材料辐照机理发现[J]. 大数据, 2021, 7(6): 3-18. |
[2] | 王健宗, 孔令炜, 黄章成, 陈霖捷, 刘懿, 卢春曦, 肖京. 联邦学习隐私保护研究进展[J]. 大数据, 2021, 7(3): 2021030-. |
[3] | 乐洁玉,罗超洋,丁静姝,李卿. 教育大数据隐私保护机制与技术研究[J]. 大数据, 2020, 6(6): 0-. |
[4] | 王健宗,孔令炜,黄章成,陈霖捷,刘懿,何安珣,肖京. 联邦学习算法综述[J]. 大数据, 2020, 6(6): 0-. |
[5] | 于璠. 新一代深度学习框架研究[J]. 大数据, 2020, 6(4): 0-. |
[6] | 汪靖伟,郑臻哲,吴帆,陈贵海. 基于区块链的数据市场[J]. 大数据, 2020, 6(3): 0-. |
[7] | 孙慧中,杨健宇,程祥,苏森. 一种基于随机投影的本地差分隐私高维数值型数据收集算法[J]. 大数据, 2020, 6(1): 0-. |
[8] | 王平,张玉书,何兴,仲盛. 基于安全压缩感知的大数据隐私保护[J]. 大数据, 2020, 6(1): 0-. |
[9] | 卢文雄,王浩宇. 基于同源策略的移动应用细粒度隐私保护技术[J]. 大数据, 2020, 6(1): 0-. |
[10] | 肖时耀,吕慰,陈洒然,秦烁,黄格,蔡梦思,谭跃进,谭旭,吕欣. 基于百度贴吧的HIV高危人群特征分析[J]. 大数据, 2019, 5(1): 2019008-. |
[11] | 张旭东,孙圣力,王洪超. 基于数据挖掘的触诊成像乳腺癌智能诊断模型和方法[J]. 大数据, 2019, 5(1): 2019005-. |
[12] | 王秉睿, 兰慧盈, 陈云霁. 深度学习编程框架[J]. 大数据, 2018, 4(4): 2018040-. |
[13] | 王智慧,周旭晨,朱云. 数据自治开放模式下的隐私保护[J]. 大数据, 2018, 4(2): 2018017-. |
[14] | 祝烈煌, 董慧, 沈蒙. 区块链交易数据隐私保护机制[J]. 大数据, 2018, 4(1): 2018005-. |
[15] | 李康,孙毅,张珺,李军,周继华,李忠诚. 零知识证明应用到区块链中的技术挑战[J]. 大数据, 2018, 4(1): 2018006-. |
阅读次数 | ||||||
全文 |
|
|||||
摘要 |
|
|||||
|