面向异构流式数据的高性能联邦持续学习算法

doi:10.11959/j.issn.1000-436x.2023102

通信学报 ›› 2023, Vol. 44 ›› Issue (5): 123-136.doi: 10.11959/j.issn.1000-436x.2023102

面向异构流式数据的高性能联邦持续学习算法

姜慧¹^,², 何天流¹^,², 刘敏¹^,²^,³, 孙胜¹, 王煜炜¹^,²

¹ 中国科学院计算技术研究所，北京 100190
² 中国科学院大学计算机科学与技术学院，北京 100190
³ 中关村实验室，北京 100084

修回日期:2023-04-28 出版日期:2023-05-25 发布日期:2023-05-01
作者简介:姜慧（1995- ），女，江苏扬州人，中国科学院大学博士生，主要研究方向为联邦学习、边缘智能、分布式机器学习等
何天流（1999- ），男，江西吉安人，中国科学院大学硕士生，主要研究方向为联邦学习、边缘智能、分布式机器学习等
刘敏（1976- ），女，河南偃师人，博士，中国科学院计算技术研究所研究员、博士生导师，主要研究方向为移动计算和边缘智能
孙胜（1990- ），女，河北衡水人，博士，中国科学院计算技术研究所助理研究员，主要方向为联邦学习、移动计算和边缘智能
王煜炜（1980- ），男，河北唐山人，博士，中国科学院计算技术研究所高级工程师、硕士生导师，主要研究方向为联邦学习、移动边缘计算和下一代网络架构
基金资助:
国家重点研发计划基金资助项目(2021YFB2900102);国家自然科学基金资助项目(62072436)

High-performance federated continual learning algorithm for heterogeneous streaming data

Hui JIANG¹^,², Tianliu HE¹^,², Min LIU¹^,²^,³, Sheng SUN¹, Yuwei WANG¹^,²

¹ Institute of Computing Technology, Chinese Academy of Sciences, Beijing 100190, China
² School of Computer Science and Technology, University of Chinese Academy of Sciences, Beijing 100190, China
³ Zhongguancun Laboratory, Beijing 100084, China

Revised:2023-04-28 Online:2023-05-25 Published:2023-05-01
Supported by:
The National Key Research and Development Program of China(2021YFB2900102);The National Natural Science Foundation of China(62072436)

摘要/Abstract

摘要：

为了缓解提供智能服务的 AI 模型训练流式数据存在模型性能差、训练效率低等问题，在具有隐私数据的分布式终端系统中，提出了一种面向异构流式数据的高性能联邦持续学习算法（FCL-HSD）。为了缓解当前模型遗忘旧数据问题，在本地训练阶段引入结构可动态扩展模型，并设计扩展审核机制，以较小的存储开销来保障AI模型识别旧数据的能力；考虑到终端的数据异构性，在中央节点侧设计了基于数据分布相似度的全局模型定制化策略，并为模型的不同模块执行分块聚合方式。在不同数据集下多种数据增量场景中验证了所提算法的可行性和有效性。实验结果证明，相较于现有工作，所提算法在保证模型对新数据具有分类能力的前提下，可以有效提升模型对旧数据的分类能力。

关键词: 异构数据, 流式数据, 联邦学习, 联邦持续学习, 灾难性遗忘

Abstract:

Aiming at the problems of poor model performance and low training efficiency in training streaming data of AI models that provide intelligent services, a high-performance federated continual learning algorithm for heterogeneous streaming data (FCL-HSD) was proposed in the distributed terminal system with privacy data.In order to solve the problem of the current model forgetting old data, a model with dynamically extensible structure was introduced in the local training stage, and an extension audit mechanism was designed to ensure the capability of the AI model to recognize old data at the cost of small storage overhead.Considering the heterogeneity of terminal data, a customized global model strategy based on data distribution similarity was designed at the central server side, and an aggregation-by-block manner was implemented for different modules of the model.The feasibility and effectiveness of the proposed algorithm were verified under various data increment scenarios with different data sets.Experimental results show that, compared with existing works, the proposed algorithm can effectively improve the model performance to classify old data on the premise of ensuring the capability to classify new data.

Key words: heterogeneous data, streaming data, federated learning, federated continual learning, catastrophic forgetting

中图分类号:

TP302

姜慧, 何天流, 刘敏, 孙胜, 王煜炜. 面向异构流式数据的高性能联邦持续学习算法[J]. 通信学报, 2023, 44(5): 123-136.

Hui JIANG, Tianliu HE, Min LIU, Sheng SUN, Yuwei WANG. High-performance federated continual learning algorithm for heterogeneous streaming data[J]. Journal on Communications, 2023, 44(5): 123-136.

图/表 11

图1

图2

图3

图4

图5

图6

图7

图8

图9

图10

图11

参考文献 24

[1]	麻省理工科技评论. 2021 年中国数字经济时代人工智能生态白皮书[R]. 2022.
	MIT Technology Review. 2021 white paper on artificial intelligence ecology in China's digital economy era[R]. 2022.
[2]	BILOGREVIC I , JADLIWALA M , KALKAN K ,et al. Privacy in mobile computing for location-sharing-based services[C]// Inter national Symposium on Privacy Enhancing Technologies Symposium. Berlin:Springer, 2011: 77-96.
[3]	LIANG X H , LI X , LUAN T H ,et al. Morality-driven data forwarding with privacy preservation in mobile social networks[J]. IEEE Transactions on Vehicular Technology, 2012,61(7): 3209-3222.
[4]	KONE?NY J , MCMAHAN H B , RAMAGE D ,et al. Federated optimization:distributed machine learning for on-device intelligence[J]. arXiv Preprint,arXiv:1610.02527, 2016.
[5]	LE J Q , LEI X Y , MU N K ,et al. Federated continuous learning with broad network architecture[J]. IEEE Transactions on Cybernetics, 2021,51(8): 3874-3888.
[6]	SERRA J , SURIS D , MIRON M ,et al. Overcoming catastrophic forgetting with hard attention to the task[C]// Proceedings of the International Conference on Machine Learning. New York:ACM Press, 2018: 4548-4557.
[7]	WU Y , CHEN Y P , WANG L J ,et al. Large scale incremental learning[C]// Proceedings of 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). Piscataway:IEEE Press, 2020: 374-382.
[8]	HE K M , ZHANG X Y , REN S Q ,et al. Deep residual learning for image recognition[C]// Proceedings of 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR). Piscataway:IEEE Press, 2016: 770-778.
[9]	CASTRO F M , MARíN-JIMéNEZ M J , GUIL N ,et al. End-to-end incremental learning[C]// European Conference on Computer Vision. Berlin:Springer, 2018: 241-257.
[10]	MASANA M , LIU X L , TWARDOWSKI B ,et al. Class-incremental learning:survey and performance evaluation on image classification[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2023,45(5): 5513-5533.
[11]	LI Z Z , HOIEM D . Learning without forgetting[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2018,40(12): 2935-2947.
[12]	REBUFFI S A , KOLESNIKOV A , SPERL G ,et al. iCaRL:incremental classifier and representation learning[C]// Proceedings of 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR). Piscataway:IEEE Press, 2017: 5533-5542.
[13]	YAN S P , XIE J W , HE X M . D:dynamically expandable representation for class incremental learning[C]// Proceedings of 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). Piscataway:IEEE Press, 2021: 3013-3022.
[14]	MUN H , LEE Y . Internet traffic classification with federated learning[J]. Electronics, 2020,10(1): 27.
[15]	DONG J H , WANG L X , FANG Z ,et al. Federated class-incremental learning[C]// Proceedings of 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). Piscataway:IEEE Press, 2022: 10154-10163.
[16]	LI T , SANJABI M , BEIRAMI A ,et al. Fair resource allocation in federated learning[J]. arXiv Preprint,arXiv:1905.10497, 2019.
[17]	ABAD M S H , OZFATURA E , GUNDUZ D ,et al. Hierarchical federated learning ACROSS heterogeneous cellular networks[C]// Proceed ings of 2020 IEEE International Conference on Acoustics,Speech and Signal Processing (ICASSP). Piscataway:IEEE Press, 2020: 8866-8870.
[18]	LONG G , XIE M , SHEN T ,et al. Multi-center federated learning:clients clustering for better personalization[J]. arXiv Preprint,arXiv:2005.01026, 2020.
[19]	KRIZHEVSKY A , SUTSKEVER I , GEOFFREY E H . Learning multiple layers of features from tiny images[J]. Communications of the ACM, 2012,60(6): 84-90.
[20]	DRAPER-GIL G , LASHKARI A H , MAMUN M S I ,et al. Characterization of encrypted and VPN traffic using time-related features[C]// Pro ceedings of the 2nd International Conference on Information Systems Security and Privacy. [S.l]:Scite Press, 2016: 407-414.
[21]	KRIZHEVSKY A , SUTSKEVER I , HINTON G E . ImageNet classification with deep convolutional neural networks[J]. Communications of the ACM, 2017,60(6): 84-90.
[22]	骆子铭, 许书彬, 刘晓东 . 基于机器学习的TLS恶意加密流量检测方案[J]. 网络与信息安全学报, 2020,60(1): 77-83.
	LUO Z M , XU S B , LIU X D . Scheme for identifying malware traffic with TLS data based on machine learning[J]. Chinese Journal of Network and Information Security, 2020,60(1): 77-83.
[23]	PACHECO F , EXPOSITO E , GINESTE M ,et al. Towards the deployment of machine learning solutions in network traffic classification:a systematic survey[J]. IEEE Communications Surveys ＆ Tutorials, 2019,21(2): 1988-2014.
[24]	XIE G R , LI Q , JIANG Y . Self-attentive deep learning method for online traffic classification and its interpretability[J]. Computer Networks, 2021,196:108267.

面向异构流式数据的高性能联邦持续学习算法

High-performance federated continual learning algorithm for heterogeneous streaming data

在线阅读

PDF下载

可视化

摘要/Abstract

引用本文

使用本文

图/表 11

参考文献 24

相关文章 15

Metrics

推荐阅读 0

[1]	马鑫迪, 李清华, 姜奇, 马卓, 高胜, 田有亮, 马建峰. 面向Non-IID数据的拜占庭鲁棒联邦学习[J]. 通信学报, 2023, 44(6): 138-153.
[2]	金彪, 李逸康, 姚志强, 陈瑜霖, 熊金波. GenFedRL：面向深度强化学习智能体的通用联邦强化学习框架[J]. 通信学报, 2023, 44(6): 183-197.
[3]	李开菊, 许强, 王豪. 冗余数据去除的联邦学习高效通信方法[J]. 通信学报, 2023, 44(5): 79-93.
[4]	余晟兴, 陈泽凯, 陈钟, 刘西蒙. DAGUARD：联邦学习下的分布式后门攻击防御方案[J]. 通信学报, 2023, 44(5): 110-122.
[5]	田有亮, 吴柿红, 李沓, 王林冬, 周骅. 基于激励机制的联邦学习优化算法[J]. 通信学报, 2023, 44(5): 169-180.
[6]	张佳乐, 朱诚诚, 孙小兵, 陈兵. 基于GAN的联邦学习成员推理攻击与防御方法[J]. 通信学报, 2023, 44(5): 193-205.
[7]	余晟兴, 陈钟. 基于同态加密的高效安全联邦学习聚合框架[J]. 通信学报, 2023, 44(1): 14-28.
[8]	汤凌韬, 王迪, 刘盛云. 面向非独立同分布数据的联邦学习数据增强方案[J]. 通信学报, 2023, 44(1): 164-176.
[9]	范绍帅, 吴剑波, 田辉. 面向能量受限工业物联网设备的联邦学习资源管理[J]. 通信学报, 2022, 43(8): 65-77.
[10]	莫梓嘉, 高志鹏, 杨杨, 林怡静, 孙山, 赵晨. 面向车联网数据隐私保护的高效分布式模型共享策略[J]. 通信学报, 2022, 43(4): 83-94.
[11]	康海燕, 冀源蕊. 基于本地化差分隐私的联邦学习方法研究[J]. 通信学报, 2022, 43(10): 94-105.
[12]	陶梅霞, 王栋, 孙瑞, 张乃夫. 联邦学习中基于时分多址接入的用户调度策略[J]. 通信学报, 2021, 42(6): 1-29.
[13]	贺文晨, 郭少勇, 邱雪松, 陈连栋, 张素香. 基于DRL的联邦学习节点选择方法[J]. 通信学报, 2021, 42(6): 62-71.
[14]	李尤慧子, 殷昱煜, 高洪皓, 金一, 王新珩. 面向隐私保护的非聚合式数据共享综述[J]. 通信学报, 2021, 42(6): 195-212.
[15]	黄永明, 郑冲, 张征明, 尤肖虎. 大规模无线通信网络移动边缘计算和缓存研究[J]. 通信学报, 2021, 42(4): 44-61.