通信学报 ›› 2021, Vol. 42 ›› Issue (6): 62-71.doi: 10.11959/j.issn.1000-436x.2021111

所属专题: 联邦学习

• 学术论文 • 上一篇    下一篇

基于DRL的联邦学习节点选择方法

贺文晨1, 郭少勇1, 邱雪松1, 陈连栋2, 张素香3   

  1. 1 北京邮电大学网络与交换技术国家重点实验室,北京 100876
    2 国网河北信息通信分公司,河北 石家庄 050011
    3 国家电网有限公司信息通信分公司,北京 100761
  • 修回日期:2021-04-16 出版日期:2021-06-25 发布日期:2021-06-01
  • 作者简介:贺文晨(1993− ),男,山东济南人,北京邮电大学博士生,主要研究方向为边缘智能、任务部署和资源分配
    郭少勇(1985− ),男,河北邢台人,博士,北京邮电大学副教授,主要研究方向为物联网与区块链
    邱雪松(1973− ),男,江西上饶人,博士,北京邮电大学教授、博士生导师,主要研究方向为网络与业务管理、物联网与区块链
    陈连栋(1987− ),男,山东莱州人,国网河北信息通信分公司副高级工程师,主要研究方向为网络信息安全和数据加密
    张素香(1973− ),女,河北衡水人,博士,国家电网有限公司信息通信分公司高级工程师,主要研究方向为电力系统通信、物联网和人工智能等
  • 基金资助:
    国家自然科学基金资助项目(62071070);教育部区块链核心计划基金资助项目(2020KJ010802);河北省重点研发计划基金资助项目(20310103D)

Node selection method in federated learning based on deep reinforcement learning

Wenchen HE1, Shaoyong GUO1, Xuesong QIU1, Liandong CHEN2, Suxiang ZHANG3   

  1. 1 State Key Laboratory of Networking and Switching Technology, Beijing University of Posts and Telecommunications, Beijing 100876, China
    2 Hebei State Grid Information &Telecommunication Branch, Shijiazhuang 050011, China
    3 State Grid Information & Telecommunication Branch, Beijing 100761, China
  • Revised:2021-04-16 Online:2021-06-25 Published:2021-06-01
  • Supported by:
    The National Natural Science Foundation of China(62071070);Key Project Plan of Blockchain in Ministry of Education of the People’s Republic of China(2020KJ010802);The Key Research and Development Program of Hebei Province(20310103D)

摘要:

为了应对设备差异化计算能力及非独立同分布数据对联邦学习性能的影响,高效地调度终端设备完成模型聚合,提出了一种基于深度强化学习的设备节点选择方法。该方法考虑异构节点的训练质量和效率,筛选恶意节点,在提升联邦学习模型准确率的同时,优化训练时延。首先,根据联邦学习中模型分布式训练的特点,构建基于深度强化学习的节点选择系统模型。其次,考虑设备训练时延、模型传输时延和准确率等因素,提出面向节点选择的准确率最优化问题模型。然后,将问题模型构建为马尔可夫决策过程,并设计基于分布式近端策略优化的节点选择算法,在每次训练迭代前选择合理的设备集合完成模型聚合。仿真实验表明,所提方法显著提高了联邦学习的准确率和训练速度,且具有良好的收敛性和稳健性。

关键词: 联邦学习, 模型聚合, 节点选择, 深度强化学习, 准确率

Abstract:

To cope with the impact of different device computing capabilities and non-independent uniformly distributed data on federated learning performance, and to efficiently schedule terminal devices to complete model aggregation, a method of node selection based on deep reinforcement learning was proposed.It considered training quality and efficiency of heterogeneous terminal devices, and filtrate malicious nodes to guarantee higher model accuracy and shorter training delay of federated learning.Firstly, according to characteristics of model distributed training in federated learning, a node selection system model based on deep reinforcement learning was constructed.Secondly, considering such factors as device training delay, model transmission delay and accuracy, an optimization model of accuracy for node selection was proposed.Finally, the problem model was constructed as a Markov decision process and a node selection algorithm based on distributed proximal strategy optimization was designed to obtain a reasonable set of devices before each training iteration to complete model aggregation.Simulation results demonstrate that the proposed method significantly improves the accuracy and training speed of federated learning, and its convergence and robustness are also well.

Key words: federated learning, model aggregation, node selection, deep reinforcement learning, accuracy

中图分类号: 

No Suggested Reading articles found!