通信学报 ›› 2023, Vol. 44 ›› Issue (5): 123-136.doi: 10.11959/j.issn.1000-436x.2023102

• 学术论文 • 上一篇    下一篇

面向异构流式数据的高性能联邦持续学习算法

姜慧1,2, 何天流1,2, 刘敏1,2,3, 孙胜1, 王煜炜1,2   

  1. 1 中国科学院计算技术研究所,北京 100190
    2 中国科学院大学计算机科学与技术学院,北京 100190
    3 中关村实验室,北京 100084
  • 修回日期:2023-04-28 出版日期:2023-05-25 发布日期:2023-05-01
  • 作者简介:姜慧(1995- ),女,江苏扬州人,中国科学院大学博士生,主要研究方向为联邦学习、边缘智能、分布式机器学习等
    何天流(1999- ),男,江西吉安人,中国科学院大学硕士生,主要研究方向为联邦学习、边缘智能、分布式机器学习等
    刘敏(1976- ),女,河南偃师人,博士,中国科学院计算技术研究所研究员、博士生导师,主要研究方向为移动计算和边缘智能
    孙胜(1990- ),女,河北衡水人,博士,中国科学院计算技术研究所助理研究员,主要方向为联邦学习、移动计算和边缘智能
    王煜炜(1980- ),男,河北唐山人,博士,中国科学院计算技术研究所高级工程师、硕士生导师,主要研究方向为联邦学习、移动边缘计算和下一代网络架构
  • 基金资助:
    国家重点研发计划基金资助项目(2021YFB2900102);国家自然科学基金资助项目(62072436)

High-performance federated continual learning algorithm for heterogeneous streaming data

Hui JIANG1,2, Tianliu HE1,2, Min LIU1,2,3, Sheng SUN1, Yuwei WANG1,2   

  1. 1 Institute of Computing Technology, Chinese Academy of Sciences, Beijing 100190, China
    2 School of Computer Science and Technology, University of Chinese Academy of Sciences, Beijing 100190, China
    3 Zhongguancun Laboratory, Beijing 100084, China
  • Revised:2023-04-28 Online:2023-05-25 Published:2023-05-01
  • Supported by:
    The National Key Research and Development Program of China(2021YFB2900102);The National Natural Science Foundation of China(62072436)

摘要:

为了缓解提供智能服务的 AI 模型训练流式数据存在模型性能差、训练效率低等问题,在具有隐私数据的分布式终端系统中,提出了一种面向异构流式数据的高性能联邦持续学习算法(FCL-HSD)。为了缓解当前模型遗忘旧数据问题,在本地训练阶段引入结构可动态扩展模型,并设计扩展审核机制,以较小的存储开销来保障AI模型识别旧数据的能力;考虑到终端的数据异构性,在中央节点侧设计了基于数据分布相似度的全局模型定制化策略,并为模型的不同模块执行分块聚合方式。在不同数据集下多种数据增量场景中验证了所提算法的可行性和有效性。实验结果证明,相较于现有工作,所提算法在保证模型对新数据具有分类能力的前提下,可以有效提升模型对旧数据的分类能力。

关键词: 异构数据, 流式数据, 联邦学习, 联邦持续学习, 灾难性遗忘

Abstract:

Aiming at the problems of poor model performance and low training efficiency in training streaming data of AI models that provide intelligent services, a high-performance federated continual learning algorithm for heterogeneous streaming data (FCL-HSD) was proposed in the distributed terminal system with privacy data.In order to solve the problem of the current model forgetting old data, a model with dynamically extensible structure was introduced in the local training stage, and an extension audit mechanism was designed to ensure the capability of the AI model to recognize old data at the cost of small storage overhead.Considering the heterogeneity of terminal data, a customized global model strategy based on data distribution similarity was designed at the central server side, and an aggregation-by-block manner was implemented for different modules of the model.The feasibility and effectiveness of the proposed algorithm were verified under various data increment scenarios with different data sets.Experimental results show that, compared with existing works, the proposed algorithm can effectively improve the model performance to classify old data on the premise of ensuring the capability to classify new data.

Key words: heterogeneous data, streaming data, federated learning, federated continual learning, catastrophic forgetting

中图分类号: 

No Suggested Reading articles found!