动态聚合权重的隐私保护联邦学习框架

doi:10.11959/j.issn.2096-109x.2022069

摘要/Abstract

摘要：

在非可信中心服务器下的隐私保护联邦学习框架中，存在以下两个问题。① 在中心服务器上聚合分布式学习模型时使用固定的权重，通常是每个参与方的数据集大小。然而，不同参与方具有非独立同分布的数据，设置固定聚合权重会使全局模型的效用无法达到最优。② 现有框架建立在中心服务器是诚实的假定下，没有考虑中央服务器不可信导致的参与方的数据隐私泄露问题。为了解决上述问题，基于比较流行的DP-FedAvg算法，提出了一种非可信中心服务器下的动态聚合权重的隐私保护联邦学习DP-DFL框架，其设定了一种动态的模型聚合权重，该方法从不同参与方的数据中直接学习联邦学习中的模型聚合权重，从而适用于非独立同分布的数据环境。此外，在本地模型隐私保护阶段注入噪声进行模型参数的隐私保护，满足不可信中心服务器的设定，从而降低本地参与方模型参数上传中的隐私泄露风险。在数据集CIFAR-10 上的实验证明，DP-DFL 框架不仅提供本地隐私保证，同时可以实现更高的准确率，相较DP-FedAvg算法模型的平均准确率提高了2.09%。

关键词: 联邦学习, 差分隐私, 动态聚合权重, 非独立同分布数据

Abstract:

There are two problems with the privacy-preserving federal learning framework under an unreliable central server.① A fixed weight, typically the size of each participant’s dataset, is used when aggregating distributed learning models on the central server.However, different participants have non-independent and homogeneously distributed data, then setting fixed aggregation weights would prevent the global model from achieving optimal utility.② Existing frameworks are built on the assumption that the central server is honest, and do not consider the problem of data privacy leakage of participants due to the untrustworthiness of the central server.To address the above issues, based on the popular DP-FedAvg algorithm, a privacy-preserving federated learning DP-DFL algorithm for dynamic weight aggregation under a non-trusted central server was proposed which set a dynamic model aggregation weight.The proposed algorithm learned the model aggregation weight in federated learning directly from the data of different participants, and thus it is applicable to non-independent homogeneously distributed data environment.In addition, the privacy of model parameters was protected using noise in the local model privacy protection phase, which satisfied the untrustworthy central server setting and thus reduced the risk of privacy leakage in the upload of model parameters from local participants.Experiments on dataset CIFAR-10 demonstrate that the DP-DFL algorithm not only provides local privacy guarantees, but also achieves higher accuracy rates with an average accuracy improvement of 2.09% compared to the DP-FedAvg algorithm models.

Key words: federated learning, differential privacy, dynamic aggregation weight, non-independent and identically distributed data

中图分类号:

TP393

应作斌, 方一晨, 张怡文. 动态聚合权重的隐私保护联邦学习框架[J]. 网络与信息安全学报, 2022, 8(5): 56-65.

Zuobin YING, Yichen FANG, Yiwen ZHANG. Privacy-preserving federated learning framework with dynamic weight aggregation[J]. Chinese Journal of Network and Information Security, 2022, 8(5): 56-65.

图/表 7

表1

图1

表2

图2

图3

表3

图4

参考文献 32

[1]	LI T , SAHU A K , TALWALKAR A ,et al. Federated learning:challenges,methods,and future directions[J]. IEEE Signal Processing Magazine, 2020,37(3): 50-60.
[2]	XU X H , PENG H , SUN L C ,et al. Fedmood:federated learning on mobile health data for mood detection[J]. arXiv preprint arXiv:2102.09342, 2021.
[3]	CHE SICONG , PENG H , SUN L C ,et al. Federated multiview learning for private medical data integration and analysis[J]. arXiv preprint arXiv:2105.01603, 2021.
[4]	MC-MAHAN B , MOORE E , RAMAGE D ,et al. Communication-efficient learning of deep networks from decentralized data[C]// Artificial Intelligence and Statistics. 2017: 1273-1282.
[5]	LIU Z , GUO J , YANG W ,et al. Privacy-preserving aggregation in federated learning:a survey[J]. arXiv preprint arXiv:2203.17005, 2022.
[6]	RAMASWAMY S , THAKKAR O , MATHEWS R ,et al. Training production language models without memorizing user data[J]. arXiv preprint arXiv:2009.10031, 2020.
[7]	LI Z , SHARMA V , MOHANTY S P . Preserving data privacy via federated learning:challenges and solutions[J]. IEEE Consumer Electronics Magazine, 2020,9(3): 8-16.
[8]	LI T , SAHU A K , ZAHEER M ,et al. Federated optimization in heterogeneous networks[J]. Proceedings of Machine Learning and Systems, 2020,2: 429-450.
[9]	陈前昕, 毕仁万, 林劼 ,等. 支持多数不规则用户的隐私保护联邦学习框架[J]. 网络与信息安全学报, 2022,8(1): 139-150.
	CHEN Q X , BI R W , LIN J ,et al. Privacy-preserving federated learning framework with irregular-majority users[J]. Chinese Journal of Network and Information Security, 2022,8(1): 139-150.
[10]	刘艺璇, 陈红, 刘宇涵 ,等. 联邦学习中的隐私保护技术[J]. 软件学报, 2021,33(3): 1057-1092.
	LIU Y X , CHEN H , LIU Y H ,et al. Privacy-preserving techniques in Federated Learning[J]. Journal of Software, 2022,33(3): 1057-1092.
[11]	王腾, 霍峥, 黄亚鑫 ,等. 联邦学习中的隐私保护技术研究综述[J]. 计算机应用, 2022.
	WANG T , HUO Z , HUANG Y X ,et al. Survey of privacy-preserving technologies in federated learning[J]. Journal of Computer Applications, 2022.
[12]	FIGURNOV M , MOHAMED S , MNIH A . Implicit repara- meterization gradients[C]// Advances in Neural Information Processing Systems. 2018: 441-452.
[13]	PAPERNOT N , ABADI M , ERLINGSSON U F ,et al. Semi-supervised knowledge transfer for deep learning from private training data[C]// ICLR. 2017.
[14]	SONG C , RISTENPART T , SHMATIKOV V . Machine learning models that remember too much[C]// 2017 ACM SIGSAC Conf.on Computer and Communications Security. 2017: 587-601.
[15]	SHOKRI R , STRONATI M , SONG C ,et al. Membership inference attacks against machine learning models[C]// 2017 IEEE Symposium on Security and Privacy (SP). 2017: 3-18.
[16]	KIM M , GüNLü O , SCHAEFER R F . Federated learning with local differential privacy:Trade-offs between privacy,utility,and communication[C]// ICASSP 2021-2021 IEEE International Conference on Acoustics,Speech and Signal Processing (ICASSP). 2021: 2650-2654.
[17]	SUN L , QIAN J , CHEN X . LDP-FL:practical private aggregation in federated learning with local differential privacy[J]. arXiv preprint arXiv:2007.15789, 2020.
[18]	ABADI M , CHU A , GOODFELLOW I ,et al. Deep learning with differential privacy[C]// Proceedings of the 2016 ACM SIGSAC Conference on Computer and Communications Security. 2016: 308-318.
[19]	MIRONOV I , TALWAR K , ZHANG L . R'enyi differential privacy of the sampled gaussian mechanism[J]. arXiv preprint arXiv:1908.10530, 2019.
[20]	SUN L , LYU L . Federated model distillation with noise-free differential privacy[J]. arXiv preprint arXiv:2009.05537, 2020.
[21]	NAGAR A , TRAN C , FIORETTO F . Privacy-preserving and accountable multi-agent learning[C]// AAMAS Conference proceedings. 2021.
[22]	DAI Z , LOW B K H , JAILLET P . Differentially private federated Bayesian optimization with distributed exploration[J]. Advances in Neural Information Processing Systems, 2021,34.
[23]	ANDREW G , THAKKAR O , MCMAHAN B ,et al. Differentially private learning with adaptive clipping[J]. Advances in Neural Information Processing Systems, 2021,34.
[24]	MC-MAHAN H B , RAMAGE D , TALWAR K ,et al. Learning differentially private recurrent language models[C]// arXiv preprint arXiv:1710. 06963, 2017.
[25]	GEYER R C , KLEIN T , NABI M . Differentially private federated learning:a client level perspective[J]. arXiv preprint arXiv:1712.07557, 2017.
[26]	WANG J Y , LIU Q H , LIANG H ,et al. Tackling the objective inconsistency problem in heterogeneous federated optimization[C]// NeurIPS. 2020.
[27]	ZHENG Q , CHEN S , LONG Q ,et al. Federated f-differential privacy[C]// International Conference on Artificial Intelligence and Statistics. 2021: 2251-2259.
[28]	HU R , GUO Y X , LI H N ,et al. Personalized federated learning with differential privacy[J]. IEEE Internet of Things Journal,7(10):9530–9539, 2020.
[29]	CHEN X N , WANG R C , CHENG M H ,et al. Drnas:dirichlet neural architecture search[J]. arXiv preprint arXiv:2006.10355, 2020.
[30]	XIA Y , YANG D , LI W ,et al. Auto-FedAvg:learnable federated averaging for multi-institutional medical image segmentation[J]. arXiv preprint arXiv:2104.10195, 2021.
[31]	ZHAO Y , ZHAO J , YANG M ,et al. Local differential privacy-based federated learning for internet of things[J]. IEEE Internet of Things Journal, 2022,8(11): 8836-8853.
[32]	LI Y W , CHANG T H , CHI C Y . Secure federated averaging algorithm with differential privacy[C]// 2020 IEEE 30th International Workshopon Machine Learning for Signal Processing (MLSP), 2020: 1-6.

符号	定义
k	参与方客户端
w_k	参与方客户端本地模型参数
t	迭代轮数
β^t	浓度参数向量
α^t	权重参数向量
w	全局模型参数
γ	Dirichlet分布
q	参与方挑选概率
C^t	参与方集合
△^t	参与方客户端模型更新
C	梯度裁剪边界值
σ	噪声规模
η	学习率
B_k	本地模型训练中选取的批量大小
S^k	参与方用户数据集

网络	方法	迭代10次后准确率
网络	方法	q=0.2	q=0.5
	DP-FedAvg	66.62%	76.14%
Resnet 18	DP-DFL	68.81%	78.56%
	FedSGD	67.34%	76.98%
	FedAvg	68.84%	78.35%
	DP-FedAvg	79.01%	86.82%
VGG16	DP-DFL	81.37%	88.21%
	FedSGD	80.67%	87.34%
	FedAvg	81.23%	88.28%