大数据 ›› 2020, Vol. 6 ›› Issue (6): 64-82.doi: 10.11959/j.issn.2096-0271.2020055
所属专题: 联邦学习
王健宗(1983- ),男,博士,平安科技(深圳)有限公司副总工程师,资深人工智能总监,联邦学习技术部总经理。美国佛罗里达大学人工智能博士后,中国计算机学会(CCF)高级会员,CCF大数据专家委员会委员,曾任美国莱斯大学电子与计算机工程系研究员,主要研究方向为联邦学习和人工智能等|孔令炜(1995- ),男,平安科技(深圳)有限公司联邦学习团队算法工程师,CCF会员,主要研究方向为联邦学习系统和安全通信等|黄章成(1990- ),男,平安科技(深圳)有限公司联邦学习团队资深算法工程师,人工智能专家,CCF会员,主要研究方向为联邦学习、分布式计算及系统和加密通信等|陈霖捷(1994- ),男,平安科技(深圳)有限公司联邦学习团队算法工程师,主要研究方向为联邦学习与隐私保护、机器翻译等|刘懿(1994- ),女,平安科技(深圳)有限公司联邦学习团队算法工程师,主要研究方向为联邦学习系统等|何安珣(1990- ),女,平安科技(深圳)有限公司联邦学习团队高级算法工程师,CCF会员,主要研究方向为联邦学习技术在金融领域的落地应用、联邦学习框架搭建、加密算法研究和模型融合技术|肖京(1972- ),男,博士,中国平安保险(集团)股份有限公司首席科学家。2019年吴文俊人工智能科学技术奖杰出贡献奖获得者,CCF深圳会员活动中心副主席,主要研究方向为计算机图形学学科、自动驾驶、3D显示、医疗诊断、联邦学习等
Jianzong WANG1,Lingwei KONG1,Zhangcheng HUANG1,Linjie CHEN1,Yi LIU1,Anxun HE1,Jing XIAO2
Supported by:
王健宗, 孔令炜, 黄章成, 陈霖捷, 刘懿, 何安珣, 肖京. 联邦学习算法综述[J]. 大数据, 2020, 6(6): 64-82.
Jianzong WANG, Lingwei KONG, Zhangcheng HUANG, Linjie CHEN, Yi LIU, Anxun HE, Jing XIAO. Research review of federated learning algorithms[J]. Big Data Research, 2020, 6(6): 64-82.
类型 | 基础 | 算法 | 框架 | 特点 |
联邦机器学习 | 联邦线性算法 | 逻辑回归[ | 中心 | 同态加密,观察模型变化,周期性梯度更新 |
逻辑回归[ | 去中心 | 取消第三方参与,有标签数据持有方主导,差分隐私 | ||
联邦树模型 | 联邦森林[ | 中心 | 模型分散存储,中心服务器储存结构 | |
梯度上升树SecureBoost[ | 去中心 | 同态加密,特征分桶聚合,保障准确率 | ||
梯度上升树SimFL[ | 去中心 | 哈希表加密,加权梯度上升,通信效率高 | ||
联邦支持向量机 | 支持向量机Valentin[ | 中心 | 哈希表加密,次梯度更新,隐私性较好 | |
联邦深度学习 | 联邦神经网络 | NN[ | 中心 | 比传统神经网络收敛更快,参数联合初始化时具有更好的收敛效果 |
联邦卷积神经网络 | CNN[ | 中心 | 网络结构比RNN简单,收敛速度更快 | |
VGG11[ | 中心 | non-IID数据上,参数压缩的优化算法收敛效果较差;不压缩的收敛效果较好,但参数量较大 | ||
联邦LSTM | LSTM[ | 中心 | 受数据分布影响较大,不同的参数聚合方式效果不同 |
优化角度 | 文献方法 | 优化方法 | 优缺点 |
通信成本 | FedAvg[ | IID 数据;增加参与方本地计算 | 增加计算成本;non-IID数据优化效果差 |
FedProx[ | non-IID数据;增加本地计算 | 增加计算成本,可优化non-IID数据,代价是准确性降低 | |
VFL[ | 纵向联邦算法;增加本地计算 | 增加计算成本,代价是降低准确性 | |
结构和轮廓更新机制[ | 压缩传输模型,提升参与方到服务器的通信效率 | 参与方到服务器参数压缩,代价是复杂的模型结构可能出现收敛问题 | |
服务器-客户端更新[ | 压缩传输模型,提升服务器到参与方的通信效率 | 服务器到参与方参数压缩,代价是准确性降低,可能有收敛问题 | |
客户端选择 | FedCS[ | 选择迭代效率最优的模型训练参与方 | 比FedAvg更准确,但是只能被应用于简单的NN模型,不适合复杂模型 |
Hybrid-FL[ | 服务器选择客户端数据组成近似IID的数据集 | non-IID数据收敛有问题 | |
异步聚合 | AsyncFedAvg[ | 服务器接收到客户端参数更新就立刻聚合 | 存在non-IID数据收敛问题 |
FedAsync[ | 服务端通过加权聚合的方式获取客户端的模型参数 | 难调参数,存在收敛问题 |
[1] | LECUN Y , BENGIO Y , HINTON G . Deep learning[J]. Nature, 2015,521(7553): 436-444. |
[2] | 王健宗, 黄章成, 肖京 . 人工智能赋能金融科技[J]. 大数据, 2018,4(3): 114-119. |
WANG J Z , HUANG Z C , XIAO J . Artificial intelligence energize Fintech[J]. Big Data Research, 2018,4(3): 114-119. | |
[3] | KAIROUZ P , MCMAHAN H B , AVENT B ,et al. Advances and open problems in federated learning[J]. arXiv preprint,2019,arXiv:1912.04977. |
[4] | YANG Q , LIU Y , CHEN T ,et al. Federated machine learning:concept and applications[J]. ACM Transactions on Intelligent Systems and Technology, 2019,10(2): 1-19. |
[5] | LI T , SAHU A K , TALWALKAR A ,et al. Federated learning:challenges,methods,and future directions[J]. IEEE Signal Processing Magazine, 2020,37(3): 50-60. |
[6] | MEHMOOD A , NATGUNANATHAN I , XIANG Y ,et al. Protection of big data privacy[J]. IEEE Access, 2016,4: 1821-1834. |
[7] | 方滨兴, 贾焰, 李爱平 ,等. 大数据隐私保护技术综述[J]. 大数据, 2016,2(1): 1-18. |
FANG B X , JIA Y , LI A P ,et al. Privacy preservation in big data:a survey[J]. Big Data Research, 2016,2(1): 1-18. | |
[8] | KONE?NY J , MCMAHAN H B , RAMAGE D ,et al. Federated optimization:distributed machine learning for ondevice intelligence[J].. arXiv preprint,2016,arXiv:1610.02527, |
[9] | KONE?NY J , MCMAHAN H B , YU F X ,et al. Federated learning:strategies for improving communication efficiency[J]. arXiv preprint,2016,arXiv:1610.05492. |
[10] | MCMAHAN H B , MOORE E , RAMAGE D ,et al. Federated learning of deep networks using model averaging[J]. arXiv preprint,2016,arXiv:1602.05629. |
[11] | MCMAHAN H B , MOORE E , RAMAGE D ,et al. Communication-efficient learning of deep networks from decentralized data[C]// Conference on Artificial Intelligence and Statistics.[S.l.:s.n]. 2017. |
[12] | LI T , SANJABI M , BEIRAMI A ,et al. Fair resource allocation in federated learning[J]. arXiv preprint,2019,arXiv:1905.10497. |
[13] | CHEN Y , SUN X Y , JIN Y C . Communication-efficient federated deep learning with layer wise asynchronous model update and temporally weighted aggregation[J]. IEEE Transactions on Neural Networks and Learning Systems,2019:Accepted. |
[14] | REHAK D R , DODDS P , LANNOM L . A model and infrastructure for federated learning content repositories[C]// Interoperability of Web-Based Educational Systems Workshop.[S.l.:s.n]. 2005. |
[15] | LI M , ANDERSEN D G , PARK J W ,et al. Scaling distributed machine learning with the parameter server[C]// The 11th USENIX Symposium on Operating Systems Design and Implementation.[S.l.:s.n]. 2014: 583-598. |
[16] | LIN Y J , HAN S , MAO H Z ,et al. Deep gradient compression:reducing the communication bandwidth for distributed training[J]. arXiv preprint,2017,arXiv:1712.01887. |
[17] | DAI W , KUMAR A , WEI J ,et al. Highperformance distributed ML at scale through parameter server consistency models[C]// AAAI Conference on Artificial Intelligence. New York:ACM Press, 2015. |
[18] | RECHT B , RE C , WRIGHT S ,et al. Hogwild:a lock-free approach to parallelizing stochastic gradient descent[C]// Advances in Neural Information Processing Systems.[S.l.:s.n]. 2011: 693-701. |
[19] | HO Q , CIPAR J , CUI H G ,et al. More effective distributed ml via a stale synchronous parallel parameter server[C]// Advances in Neural Information Processing Systems.[S.l.:s.n]. 2013: 1223-1231. |
[20] | FENG S W , YU H . Multi-participant multi-class vertical federated learning[J]. arXiv preprint,2020,arXiv:2001.11154. |
[21] | KONE?NY J . Stochastic distributed and federated optimization for machine learning[J]. arXiv preprint,2017,arXiv:1707.01155. |
[22] | LIU X Y , LI H W , XU G W ,et al. Adaptive privacy-preserving federated learning[J]. Peer-to-Peer Networking and Applications. 2020 |
[23] | HU R , GONG Y M , GUO Y X . CPFed:communication-efficient and privacypreserving federated learning[J]. arXiv preprint,2020,arXiv:2003.13761. |
[24] | RYFFEL T , TRASK A , DAHL M ,et al. A generic framework for privacy preserving deep learning[J]. arXiv preprint,2018,arXiv:1811.04017. |
[25] | ANTONIOUS M , DEEPESH D , SUHAS D ,et al. Shuffled model of federated learning:privacy,communication and accuracy trade-offs[J]. arXiv preprint,2020,arXiv:008.07180. |
[26] | SMITH V , CHIANG C K , SANJABI M ,et al. Federated multi-task learning[C]// Advances in Neural Information Processing Systems.[S.l.:s.n]. 2017: 4424-4434. |
[27] | CORINZIA L , BUHMANN J M . Variational federated multi-task learning[J]. arXiv preprint,2019,arXiv:1906.06268. |
[28] | CALDAS S , SMITH V , TALWALKAR A . Federated kernelized multi-task learning[C]// SysML Conference 2018.[S.l.:s.n]. 2018. |
[29] | KALLMAN R , KIMURA H , NATKINS J ,et al. H-store:a high-performance,distributed main memory transaction processing system[J]. Proceedings of the VLDB Endowment, 2008,1(2): 1496-1499. |
[30] | YANG K , JIANG T , SHI Y M ,et al. Federated learning via over-the-air computation[J]. IEEE Transactions on Wireless Communications, 2020,19(3): 2022-2035. |
[31] | NISHIO T , YONETANI R . Client selection for federated learning with heterogeneous resources in mobile edge[C]// 2019 IEEE International Conference on Communications. Piscataway:IEEE Press, 2019: 1-7. |
[32] | WANG J Y , SAHU A K , YANG Z Y ,et al. MATCHA:speeding up decentralized SGD via matching decomposition sampling[J]. arXiv preprint,2019,arXiv:1905.09435. |
[33] | REISIZADEH A , MOKHTARI A , HASSANI H ,et al. Fedpaq:a communication-efficient federated learning method with periodic averaging and quantization[J]. arXiv preprint,2019,arXiv:1909.13014. |
[34] | KHALED A , MISHCHENKO K , RICHTáRIK P . Better communication complexity for local SGD[J]. arXiv preprint,2019,arXiv:1909.04746. |
[35] | LI S Y , CHENG Y , LIU Y ,et al. Abnormal client behavior detection in federated learning[J]. arXiv preprint,2019,arXiv:1910.09933. |
[36] | SATTLER F , WIEDEMANN S,MüLLER K R ,et al. Robust and communicationefficient federated learning from nonIID data[J]. IEEE Transactions on Neural Networks and Learning Systems. 2019 |
[37] | C ROTTY A , GALAKATOS A , KRASKA T . Tupleware:distributed machine learning on small clusters[J]. IEEE Data Engineering Bulletin, 2014,37(3): 63-76. |
[38] | JOLFAEI A , OSTOVARI P , ALAZAB M ,et al. Guest editorial special issue on privacy and security in distributed edge computing and evolving IoT[J]. IEEE Internet of Things Journal, 2020,7(4): 2496-2500. |
[39] | SAHU A K , LI T , SANJABI M ,et al. Federated optimization for heterogeneous networks[J]. arXiv preprint,2018,arXiv:1812.06127. |
[40] | YANG K , FAN T , CHEN T J ,et al. A quasi-newton method based vertical federated learning framework for logistic regression[J]. arXiv preprint,2019,arXiv:1912.00513. |
[41] | YANG S W , REN B , ZHOU X H ,et al. Parallel distributed logistic regression for vertical federated learning without thirdparty coordinator[J]. arXiv preprint,2019,arXiv:1911.09824. |
[42] | GAO D S , JU C , WEI X G ,et al. HHHFL:hierarchical heterogeneous horizontal federated learning for electroencephalography[J]. arXiv preprint,2019,arXiv:1909.05784. |
[43] | LIU Y , KANG Y , ZHANG X W ,et al. A communication efficient vertical federated learning framework[J]. arXiv preprint,2019,arXiv:1912.11187. |
[44] | SHARMA S , XING C P , LIU Y ,et al. Secure and efficient federated transfer learning[J]. arXiv preprint,2019,arXiv:1910.13271. |
[45] | ZHAO Y , LI M , LAI L Z ,et al. Federated learning with non-IID data[J]. arXiv preprint,2018,arXiv:1806.00582. |
[46] | LIU Y , LIU Y T , LIU Z J ,et al. Federated forest[J]. IEEE Transactions on Big Data,2020:Accepted. |
[47] | CHENG K W , FAN T , JIN Y L ,et al. SecureBoost:a lossless federated learning framework[J]. arXiv preprint,2019,arXiv:1901.08755. |
[48] | LI Q B , WEN Z Y , HE B S . Practical federated gradient boosting decision trees[J]. arXiv preprint,2019,arXiv:1911.04206. |
[49] | HARTMANN V , MODI K , PUJOL J M ,et al. Privacy-preserving classification with secret vector machines[J]. arXiv preprint,2019,arXiv:1907.03373. |
[50] | ZHU X H , WANG J , HONG Z ,et al. Federated learning of unsegmented Chinese text recognition model[C]// 2019 IEEE 31st International Conference on Tools with Artificial Intelligence. Piscataway:IEEE Press, 2019: 1341-1345. |
[51] | BHOWMICK A , DUCHI J , FREUDIGER J ,et al. Protection against reconstruction and its applications in private federated learning[J]. arXiv preprint,2018,arXiv:1812.00984. |
[52] | DUCHI J , JORDAN M I , MCMAHAN B . Estimation,optimization,and parallelism when data is sparse[C]// In Advances in Neural Information Processing Systems. New York:ACM Press, 2013. |
[53] | CHILIMBI T , SUZUE Y , APACIBLE J ,et al. Project adam:building an efficient and scalable deep learning training system[C]// The 11th USENIX Symposium on Operating Systems Design and Implementation. New York:ACM Press, 2014: 571-582. |
[54] | LIU Y , MUPPALA J K , VEERARAGHAVAN M ,et al. Data center networks:topologies,architectures and fault-tolerance characteristics[M]. Heidelberg: Springer Science & Business MediaPress, 2013. |
[55] | BONAWITZ K , EICHNER H , GRIESKAMP W ,et al. Towards federated learning at scale:system design[J]. arXiv preprint,2019,arXiv:1902.01046. |
[56] | LI X , HUANG K , YANG W ,et al. On the convergence of FedAvg on non-IID data[J]. arXiv preprint,2019,arXiv:1907.02189. |
[57] | CALDAS S,KONE?NY J , MCMAHAN H B ,et al. Expanding the reach of federated learning by reducing client resource requirements[J]. arXiv preprint,2018,arXiv:1812.07210. |
[58] | NISHIO T , YONETANI R . Client selection for federated learning with heterogeneous resources in mobile edge[C]// ICC 20192019 IEEE International Conference on Communications. Piscataway:IEEE Press, 2019: 1-7. |
[59] | YOSHIDA N , NISHIO T , MORIKURA M ,et al. Hybrid-FL for wireless networks:cooperative learning mechanism using non-IID data[J]. arXiv preprint,2019,arXiv:1905.07210. |
[60] | SPRAGUE M R , JALALIRAD A , SCAVUZZO M ,et al. Asynchronous federated learning for geospatial applications[C]// Joint European Conference on Machine Learning and Knowledge Discovery in Databases. Heidelberg:Springer, 2018: 21-28. |
[61] | XIE C , KOYEJO S , GUPTA I . Asynchronous federated optimization[J]. arXiv preprint,2019,arXiv:1903.03934. |
[62] | YANG J L , DUAN Y X , QIAO T ,et al. Prototyping federated learning on edge computing systems[J]. Frontiers of Computer Science, 2020,14: 1-3. |
[63] | WANG S Q , TUOR T , SALONIDIS T ,et al. Adaptive federated learning in resource constrained edge computing systems[J]. IEEE Journal on Selected Areas in Communications, 2019,37(6): 1205-1221. |
[64] | ZHAO Y , ZHAO J , JIANG L S ,et al. Mobile edge computing,blockchain and reputation-based crowd-sourcing IoT federated learning:a secure,decentralized and privacy-preserving system[J]. arXiv preprint,2019,arXiv:1906.10893. |
[65] | LI Z Y , LIU J , HAO J L ,et al. CrowdSFL:a secure crowd computing framework based on blockchain and federated learning[J]. Electronics, 2020,9(5):773. |
[66] | KANG J W , XIONG Z H , NIYATO D ,et al. Incentive design for efficient federated learning in mobile networks:a contract theory approach[C]// 2019 IEEE VTS Asia Pacific Wireless Communications Symposium. Piscataway:IEEE Press, 2019: 1-5. |
[67] | ISAKSSON M , NORRMAN K . Secure federated learning in 5G mobile networks[J]. arXiv preprint,2020,arXiv:2004.06700. |
[1] | 钱海红, 王茂异, 熊贇. 高等教育数字化转型的现状与发展研究[J]. 大数据, 2023, 9(3): 56-70. |
[2] | 张传尧, 司世景, 王健宗, 肖京. 联邦元学习综述[J]. 大数据, 2023, 9(2): 122-146. |
[3] | 梅宏, 杜小勇, 金海, 程学旗, 柴云鹏, 石宣化, 靳小龙, 王亚沙, 刘驰. 大数据技术前瞻[J]. 大数据, 2023, 9(1): 1-20. |
[4] | 沈阳, 余梦珑. 元宇宙与大数据:时空智能中的数据洞察与价值连接[J]. 大数据, 2023, 9(1): 103-110. |
[5] | 陈静. 人文大数据及其在数字人文领域中的应用[J]. 大数据, 2022, 8(6): 3-14. |
[6] | 罗煜楚, 吴昊, 郭宇涵, 谭绍聪, 刘灿, 蒋瑞珂, 袁晓如. 数字人文中的可视化[J]. 大数据, 2022, 8(6): 74-93. |
[7] | 郑童哲恒, 李斌, 冯敏萱, 常博林, 王东波. 历史典籍的结构化探索——《史记·列传》数字人文知识库的构建与可视化研究[J]. 大数据, 2022, 8(6): 40-55. |
[8] | 张燕, 杨一帆, 伊人, 罗圣美, 唐剑飞, 夏正勋. 隐私计算场景下数据质量治理探索与实践[J]. 大数据, 2022, 8(5): 55-73. |
[9] | 尹虹舒, 周旭华, 周文君. 纵向联邦线性模型在线推理过程中成员推断攻击的隐私保护研究[J]. 大数据, 2022, 8(5): 45-54. |
[10] | 吴建汉, 司世景, 王健宗, 肖京. 联邦学习攻击与防御综述[J]. 大数据, 2022, 8(5): 12-32. |
[11] | 朱智韬, 司世景, 王健宗, 肖京. 联邦推荐系统综述[J]. 大数据, 2022, 8(4): 105-132. |
[12] | 李汶龙, 袁媛, 安筱鹏. 刍议大数据治理的三大基础思维[J]. 大数据, 2022, 8(4): 34-45. |
[13] | 汤奇峰, 邵志清, 叶雅珍. 数据交易中的权利确认和授予体系[J]. 大数据, 2022, 8(3): 40-53. |
[14] | 王陈慧子, 蔡玮. 元宇宙数字经济:现状、特征与发展建议[J]. 大数据, 2022, 8(3): 140-150. |
[15] | 杨玫, 李玮, 乔思渊, 刘巍. 中国大数据产业产值测算方法研究[J]. 大数据, 2022, 8(3): 151-160. |
阅读次数 | ||||||
全文 |
摘要 |