大数据 ›› 2021, Vol. 7 ›› Issue (3): 130-149.doi: 10.11959/j.issn.2096-0271.2021030
所属专题: 联邦学习
王健宗, 孔令炜, 黄章成, 陈霖捷, 刘懿, 卢春曦, 肖京
出版日期:
2021-05-15
发布日期:
2021-05-01
作者简介:
王健宗(1983- ),男,博士,平安科技(深圳)有限公司副总工程师,资深人工智能总监,联邦学习技术部总经理。美国佛罗里达大学人工智能博士后,中国计算机学会高级会员,中国计算机学会大数据专家委员会委员,主要研究方向为联邦学习和人工智能等。基金资助:
Jianzong WANG, Lingwei KONG, Zhangcheng HUANG, Linjie CHEN, Yi LIU, Chunxi LU, Jing XIAO
Online:
2021-05-15
Published:
2021-05-01
Supported by:
摘要:
针对隐私保护的法律法规相继出台,数据孤岛现象已成为阻碍大数据和人工智能技术发展的主要瓶颈。联邦学习作为隐私计算的重要技术被广泛关注。从联邦学习的历史发展、概念、架构分类角度,阐述了联邦学习的技术优势,同时分析了联邦学习系统的各种攻击方式及其分类,讨论了不同联邦学习加密算法的差异。总结了联邦学习隐私保护和安全机制领域的研究,并提出了挑战和展望。
中图分类号:
王健宗, 孔令炜, 黄章成, 陈霖捷, 刘懿, 卢春曦, 肖京. 联邦学习隐私保护研究进展[J]. 大数据, 2021, 7(3): 130-149.
Jianzong WANG, Lingwei KONG, Zhangcheng HUANG, Linjie CHEN, Yi LIU, Chunxi LU, Jing XIAO. Research advances on privacy protection of federated learning[J]. Big Data Research, 2021, 7(3): 130-149.
表1
联邦学习攻击类型"
对比项 | 攻击方式分类 | 主要的方法 | 描述 |
攻击方法 | 中毒攻击 | 污染或者破坏数据模型 | 数据中毒:通过替换本地数据的标签或者特定的特征影响联邦学习的过程。该方式虽然具有局限性但更加隐蔽 |
模型中毒:一个或多个传统参与方通过替换模型来影响模型更新过程。该方式比数据中毒的影响更严重 | |||
拜占庭攻击 | 控制多个用户影响全局模型更新 | 攻击方控制多个用户,向中心服务器发送任意参数,由此影响全局模型,导致其偏离正常训练过程 | |
女巫攻击 | 直接通信、伪造或者盗用身份、同时攻击和非同时攻击等 | 单一节点具有多个身份信息,通过少数节点控制多个虚假身份,控制或影响网络中的大量正常节点 | |
攻击阶段 | 训练阶段的攻击 | 在训练阶段,从数据或者模型的角度影响联邦学习模型 | 单个参与方或多个参与方在训练阶段通过入侵不同客户端且配合中毒攻击的方式,在不同的轮次发起攻击 |
推理阶段的攻击 | 在推理阶段,影响模型的预测结果 | 该方式依赖于攻击方对模型的了解程度,被分为白盒攻 | |
击和黑盒攻击 |
表2
加密保护机制对比"
加密算法 | 特点 | 性质 | 应用 |
混淆电路 | 非对称加密 | 以布尔函数的观点构造安全函数进行计算 | 常用来构建安全多方计算环境 |
同态加密 | 非对称加密 | 对密文进行代数运算解密后的结果与对明文进行相同的代数运算的结果相同 | 数据拥有方需要进行大量的运算,但本身算力不足时,常常使用同态加密 |
差分隐私 | 函数加密 | 通过向聚合查询结果添加随机噪声实现 | 保护个人条目,最大限度地减少记录识别机会 |
秘密分享 | 一般为对称加密 | 将秘密进行拆分,并分配给不同的参与方,单个参与方无法恢复秘密信息 | 防止秘密过于集中,实现风险分散和一定的入侵容忍性 |
混合加密 | 对称加密和非对称加密混合 | 对称加密对信息进行加密,非对称加密对密钥进行加密 | 结合对称加密和非对称加密的优点,保证消息机密性 |
[25] | NAOR M , PINKAS B . Efficient oblivious transfer protocols[C]// Proceedings of the 20th Annual Symposium on Discrete Algorithms.[S.l.:s.n.], 2001. |
[26] | RABIN M O . How to exchange secrets with oblivious transfer[J]. IACR Cryptol.ePrint Arch., 2005(187). |
[27] | HALEVI S , SHOUP V . Design and implementation of a homomorphicencryption library[J]. IBM Research (Manuscript), 2013,6: 12-15. |
[28] | DAI W , SUNAR B . A homomorphic encryption accelerator library[C]// Proceedings of the Springer International Publishing.[S.l.:s.n.], 2015. |
[29] | YUAN J W , YU S C . Privacy preserving back-propagation neural net-work learning made practical with cloud computing[J]. IEEE Transactions on Parallel and Distributed Systems, 2013,5(1): 212-221. |
[30] | HO Q R,CIPARJ , CUI H G ,et al. More effective distributed ml via a stale synchronous parallel parameter server[J]. Advances in Neural Information Processing Systems, 2013: 1223-1231. |
[31] | HARDY S , HENECKA W,IVEY-LAW H ,et al. Private federated learning on vertically partitioned data via entity resolution and additively homomorphic encryption[J]. arXiv preprint, 2017,arXiv:1711.10677. |
[32] | DWORK C . A firm foundation for private data analysis[J]. Communications of the ACM, 2011,54(1): 86-95. |
[33] | ABADIM , CHUA , GOODFELLOW I ,et al. Deep learning with differential privacy[C]// Proceedings of the 2016 ACM SIGSAC Conference on Computer and Communications Security. New York:ACM Press, 2016: 308-318. |
[34] | DWORK C , MCSHERRY F , NISSIM K ,et al. Calibrating noise to sensitivity in private data analysis[C]// Proceedings of the Theory of Cryptography Conference.[S.l.:s.n.], 2006: 265-284. |
[1] | KONE?NY J , MCMAHAN H B , RAMAGE D ,et al. Federated optimization:distributed machine learning for ondevice intelligence[J]. arXiv preprint, 2016,arXiv:1610.02527. |
[2] | GOODFELLOW I , YOSHUA B , AARON C . Deep learning[M]. Massachusetts: MIT Press, 2016. |
[35] | DWORK C , ROTH A . The algorithmic foundations of differential privacy[J]. Foundations and Trends? in Databases, 2014,9(3-4): 211-407. |
[36] | BASSILY R , SMITH A , THAKURTA A . Private empirical risk minimization:efficient algorithms and tight error bounds[C]// Proceedings of the 2014 IEEE 55th Annual Symposium on Foundations of Computer Science. Piscataway:IEEE Press, 2014: 464-473. |
[3] | 王健宗, 黄章成, 肖京 . 人工智能赋能金融科技[J]. 大数据, 2018,4(3): 114-119. |
WANG J Z , HUANG Z C , XIAO J . Artificial intelligence energize Fintech[J]. Big Data Research, 2018,4(3): 114-119. | |
[37] | PAPERNOT N , SONG S , MIRONOV I ,et al. Scalable private learning with pate[J]. arXiv preprint, 2018,arXiv:1802.08908. |
[38] | WU X , LI F G , KUMAR A ,et al. Bolt-on differential privacy for scalable stochastic gradient descent-based analytics[C]// Proceedings of the 2017 ACM International Conference on Management of Data. New York:ACM Press, 2017: 1307-1322. |
[4] | KONEN J , MCMAHAN H B , YU F X ,et al. Federated learning:strategies for improving communication efficiency[J]. arXiv preprint, 2016,arXiv:1610.05492. |
[5] | 刘雅辉, 张铁赢, 靳小龙 ,等. 大数据时代的个人隐私保护[J]. 计算机研究与发展, 2015,52(1): 229-247. |
[39] | LUCA M , GEORGE D , EMILIANO DE C . Efficient private statistics with succinct sketches[J]. arXiv preprint, 2015,arXiv:1508.06110. |
[40] | BUN M , STEINKE T . Concentrated differential privacy:simplifications,extensions,and lower bounds[C]// Proceedings of the Theory of Cryptography Conference. Berlin:Springer, 2016: 635-658. |
[5] | LIU Y H , ZHANG T Y , JIN X L ,et al. Personal privacy protection in the era of big data[J]. Journal of Computer Research and Development, 2015,52(1): 229-247. |
[6] | 孟绪颖, 张琦佳, 张瀚文 ,等. 社交网络链路预测的个性化隐私保护方法[J]. 计算机研究与发展, 2019,56(6): 1244-1251. |
[41] | CHOUDHURY O , GKOULALAS-DIVANIS A , SALONIDIS T ,et al. Differential privacy-enabled federated learning for sensitive health data[J]. arXiv preprint, 2019,arXiv:1910.02578. |
[42] | GEYER R C , KLEIN T , NABI M . Differentially private federated learning:a client level perspective[J]. arXiv preprint, 2017,arXiv:1712.07557. |
[6] | MENG X Y , ZHANG Q J , ZHANG H W ,et al. Personalized privacy preserving link prediction in social networks[J]. Journal of Computer Research and Development, 2019,56(6): 1244-1251. |
[7] | 韩璇, 袁勇, 王飞跃 . 区块链安全问题:研究现状与展望[J]. 自动化学报, 2019,45(1): 206-225. |
[43] | TIAN X X , SHA C F , WANG X L ,et al. Privacy preserving query processing on secret share based data storage[C]// Proceedings of the International Conference on Database Systems for Advanced Applications. Berlin:Springer, 2011: 108-122. |
[44] | BONAWITZ K , IVANOV V , KREUTER B ,et al. Practical secure aggregation for federated learning on user-held data[J]. arXiv preprint, 2016,arXiv:1611.04482. |
[7] | HAN X , YUAN Y , WANG F Y . Security problems on blockchain:the state of the art and future trends[J]. Acta Automatica Sinica, 2019,45(1): 206-225. |
[8] | YANG Q , LIU Y , CHEN T J ,et al. Federated machine learning:concept and applications[J]. ACM Transactions on Intelligent Systems and Technology, 2019,10(2): 1-19. |
[9] | PHONG L T , AONO Y , HAYASHI T ,et al. Privacy-preserving deep learning via additively homomorphic encryption[J]. IEEE Transactions on Information Forensics and Security, 2018(5): 1. |
[10] | ZHU L , LIU Z , HAN S . Deep leakage from gradients[C]// Proceedings of the Advances in Neural Information Processing Systems.[S.l:s.n.], 2019: 14774-14784. |
[11] | BAGDASARYAN E , VEIT A , HUA Y ,et al. How to backdoor federated learning[C]// Proceedings of the International Conference on Artificial Intelligence and Statistics.[S.l.:s.n.], 2020. |
[12] | BHAGOJI A N , CHAKRABORTY S , MITTAL P ,et al. Analyzing federated learning through an adversarial lens[C]// Proceedings of the International Conference on Machine Learning.[S.l.:s.n.], 2019. |
[13] | CHEN L J , WANG H Y , CHARLES Z ,et al. DRACO:byzantine-resilient distributed training via redundant gradients[J]. arXiv preprint, 2018,arXiv:1803.09877. |
[14] | FUNG C , YOON C J M , BESCHASTNIKH I . Mitigating sybils in federated learning poisoning[J]. arXiv preprint, 2018,arXiv:1808.04866. |
[45] | TASSA T . Hierarchical threshold secret sharing[J]. Journal of Cryptology, 2007,20(2): 237-264. |
[46] | PETTAI M , PEETER L . Combining differential privacy and secure multiparty computation[C]// Proceedings of the 31st Annual Computer Security Applications Conference. New York:ACM Press, 2015. |
[15] | ABHISHEK B , JOHN D , JULIEN F ,et al. Protection against reconstruction and its applications in private federated learning[J]. arXiv preprint, 2018,arXiv:1812.00984. |
[16] | CARLINI N , LIU C , KOS J ,et al. The secret sharer:measuring unintended neural network memorization & extracting secrets[J]. arXiv preprint, 2018,arXiv:1802.08232. |
[47] | JEONG E , OH S , KIM H ,et al. Communication-efficient on-device machine learning:federated distillation and augmentation under non-iid private data[J]. arXiv preprint, 2018,arXiv:1811.11479. |
[48] | BONAWITZ K , IVANOV V , KREUTER B ,et al. Practical secure aggregation for privacy-preserving machine learning[C]// Proceedings of the 2017 ACM SIGSAC Conference on Computer and Communications Security. New York:ACM Press, 2017: 1175-1191. |
[17] | FREDRIKSON M , JHA S , RISTENPART T . Model inversion attacks that exploit confidence information and basic countermeasures[C]// Proceedings of the 22nd ACM SIGSAC Conference on Computer and Communications Security. New York:ACM Press, 2015: 1322-1333. |
[18] | BARRENO M , NELSON B , SEARS R ,et al. Can machine learning be secure[C]// Proceedings of the 2006 ACM Symposium on Information,Computer and Communications Security. New York:ACM Press, 2006. |
[49] | XU R H , BARACALDO N , ZHOU Y ,et al. HybridAlpha:an efficient approach for privacy-preserving federated learning[C]// Proceedings of the 12th ACM Workshop on Artificial Intelligence and Security. New York:ACM Press, 2019. |
[50] | CHAUM D . The dining cryptographers problem:unconditional sender and recipient untraceability[J]. Journal of Cryptology, 1988,1(1): 65-75. |
[19] | 孙慧中, 杨健宇, 程祥 ,等. 一种基于随机投影的本地差分隐私高维数值型数据收集算法[J]. 大数据, 2020,6(1): 1-11. |
SUN H Z , YANG J Y , CHENG X ,et al. A high-dimensional numeric data collection algorithm for local difference privacy based on random projection[J]. Big Data Research, 2020,6(1): 1-11. | |
[51] | SLAWOMIR G , LI X . A comprehensive comparison of multiparty secure additions with differential privacy[J]. IEEE Transactions on Dependable and Secure Computing, 2015,14(5): 463-477. |
[52] | SADEGH RM , CHRISTIAN W , OLEKSANDR T ,et al. Chameleon:a hybrid secure computation framework for machine learning applications[C]// Proceedings of the 2018 on Asia Conference on Computer and Communications Security. New York:ACM Press, 2018: 707-721. |
[20] | 王平, 张玉书, 何兴 ,等. 基于安全压缩感知的大数据隐私保护[J]. 大数据, 2020,6(1): 12-22. |
WANG P , ZHANG Y S , HE X ,et al. Big data privacy protection based on secure compressive sensing[J]. Big Data Research, 2020,6(1): 12-22. | |
[21] | 卢文雄, 王浩宇 . 基于同源策略的移动应用细粒度隐私保护技术[J]. 大数据, 2020,6(1): 23-34. |
LU W X , WANG H Y . Same origin based fine-grained privacy protection for mobile applications[J]. Big Data Research, 2020,6(1): 23-34. | |
[22] | 孟小峰, 王雷霞 . 人工智能时代的数据隐私、垄断与公平[J]. 大数据, 2020,6(1): 35-46. |
MENG X F , WANG L X . Data privacy,monopoly and fairness for AI[J]. Big Data Research, 2020,6(1): 35-46. | |
[23] | 李政, 洪莹 . 基于隐私保护的政府大数据治理研究[J]. 大数据, 2020,6(2): 69-82. |
LI Z , HONG Y . Study on big data management for government based on privacy protection[J]. Big Data Research, 2020,6(2): 69-82. | |
[53] | FENG D G , QIN Y , FENG W ,et al. The theory and practice in the evolution of trusted computing[J]. Chinese Science Bulletin, 2014,59(32): 4173-4189. |
[24] | YAO C C , . How to generate and exchange secrets[C]// Proceedings of the Symposium on Foundations of Computer Science. Piscataway:IEEE Press, 2008. |
[1] | 叶剑, 李文. 支持互联互通的隐私计算网关设计与实现[J]. 大数据, 2023, 9(6): 28-38. |
[2] | 李云辉, 陈家辉. 基于区块链的感知数据交易隐私保护方案[J]. 大数据, 2023, 9(6): 39-52. |
[3] | 张传尧, 司世景, 王健宗, 肖京. 联邦元学习综述[J]. 大数据, 2023, 9(2): 122-146. |
[4] | 阮雯强, 徐铭辛, 涂新宇, 宋鲁杉, 韩伟力. 数据租赁——数据流通的新方式[J]. 大数据, 2022, 8(5): 3-11. |
[5] | 尹虹舒, 周旭华, 周文君. 纵向联邦线性模型在线推理过程中成员推断攻击的隐私保护研究[J]. 大数据, 2022, 8(5): 45-54. |
[6] | 吴建汉, 司世景, 王健宗, 肖京. 联邦学习攻击与防御综述[J]. 大数据, 2022, 8(5): 12-32. |
[7] | 李懿, 王劲松, 张洪玮. 基于区块链与函数加密的隐私数据安全共享模型研究[J]. 大数据, 2022, 8(5): 33-44. |
[8] | 张燕, 杨一帆, 伊人, 罗圣美, 唐剑飞, 夏正勋. 隐私计算场景下数据质量治理探索与实践[J]. 大数据, 2022, 8(5): 55-73. |
[9] | 朱智韬, 司世景, 王健宗, 肖京. 联邦推荐系统综述[J]. 大数据, 2022, 8(4): 105-132. |
[10] | 王健宗, 孔令炜, 黄章成, 陈霖捷, 刘懿, 何安珣, 肖京. 联邦学习算法综述[J]. 大数据, 2020, 6(6): 64-82. |
[11] | 乐洁玉, 罗超洋, 丁静姝, 李卿. 教育大数据隐私保护机制与技术研究[J]. 大数据, 2020, 6(6): 52-63. |
[12] | 汪靖伟, 郑臻哲, 吴帆, 陈贵海. 基于区块链的数据市场[J]. 大数据, 2020, 6(3): 21-35. |
[13] | 卢文雄, 王浩宇. 基于同源策略的移动应用细粒度隐私保护技术[J]. 大数据, 2020, 6(1): 23-34. |
[14] | 王平, 张玉书, 何兴, 仲盛. 基于安全压缩感知的大数据隐私保护[J]. 大数据, 2020, 6(1): 12-22. |
[15] | 孙慧中, 杨健宇, 程祥, 苏森. 一种基于随机投影的本地差分隐私高维数值型数据收集算法[J]. 大数据, 2020, 6(1): 3-11. |
阅读次数 | ||||||
全文 |
|
|||||
摘要 |
|
|||||
|