面向纵向联邦学习的对抗样本生成算法

doi:10.11959/j.issn.1000-436x.2023149

通信学报 ›› 2023, Vol. 44 ›› Issue (8): 1-13.doi: 10.11959/j.issn.1000-436x.2023149

• 学术论文 •

面向纵向联邦学习的对抗样本生成算法

陈晓霖¹^,², 昝道广¹^,², 吴炳潮¹^,², 关贝²^,³, 王永吉²^,³

¹ 中国科学院软件研究所协同创新中心，北京 100190
² 中国科学院大学计算机科学与技术学院，北京 100049
³ 中国科学院软件研究所集成创新中心，北京 100190

修回日期:2023-07-25 出版日期:2023-08-01 发布日期:2023-08-01
作者简介:陈晓霖（1996- ），男，山东潍坊人，中国科学院软件研究所博士生，主要研究方向为机器学习、联邦学习、隐私计算
昝道广（1997- ），男，山东济宁人，中国科学院软件研究所博士生，主要研究方向为自然语言处理、代码生成
吴炳潮（1994- ），男，浙江绍兴人，中国科学院软件研究所博士生，主要研究方向为人工智能、推荐系统
关贝（1986- ），男，山西运城人，博士，中国科学院软件研究所高级工程师，主要研究方向为人工智能和大数据、网络安全技术、虚拟化技术、操作系统技术、云计算
王永吉（1962- ），男，辽宁营口人，博士，中国科学院软件研究所研究员、博士生导师，主要研究方向为人工智能、大数据分析、智能制造、云计算、隐蔽信道、高可信网络技术
基金资助:
国家自然科学基金资助项目(61762062)

Adversarial sample generation algorithm for vertical federated learning

Xiaolin CHEN¹^,², Daoguang ZAN¹^,², Bingchao WU¹^,², Bei GUAN²^,³, Yongji WANG²^,³

¹ Collaborative Innovation Center, Institute of Software, Chinese Academy of Sciences, Beijing 100190, China
² University of Chinese Academy of Sciences, School of Computer Science and Technology, Beijing 100049, China
³ Integrated Innovation Center, Institute of Software, Chinese Academy of Sciences, Beijing 100190, China

Revised:2023-07-25 Online:2023-08-01 Published:2023-08-01
Supported by:
The National Natural Science Foundation of China(61762062)

摘要/Abstract

摘要：

为了适应纵向联邦学习应用中高通信成本、快速模型迭代和数据分散式存储的场景特点，提出了一种通用的纵向联邦学习对抗样本生成算法VFL-GASG。具体而言，构建了一种适用于纵向联邦学习架构的对抗样本生成框架来实现白盒对抗攻击，并在该架构下扩展实现了L-BFGS、FGSM、C＆amp;W等不同策略的集中式机器学习对抗样本生成算法。借鉴深度卷积生成对抗网络的反卷积层设计，设计了一种对抗样本生成算法 VFL-GASG 以解决推理阶段对抗性扰动生成的通用性问题，该算法以本地特征的隐层向量作为先验知识训练生成模型，经由反卷积网络层产生精细的对抗性扰动，并通过判别器和扰动项控制扰动幅度。实验表明，相较于基线算法，所提算法在保持高攻击成功率的同时，在生成效率、鲁棒性和泛化能力上均达到较高水平，并通过实验验证了不同实验设置对对抗攻击效果的影响。

关键词: 机器学习, 纵向联邦学习, 对抗样本, 对抗攻击, 深度卷积生成对抗网络

Abstract:

To adapt to the scenario characteristics of vertical federated learning (VFL) applications regarding high communication cost, fast model iteration, and decentralized data storage, a generalized adversarial sample generation algorithm named VFL-GASG was proposed.Specifically, an adversarial sample generation framework was constructed for the VFL architecture.A white-box adversarial attack in the VFL was implemented by extending the centralized machine learning adversarial sample generation algorithm with different policies such as L-BFGS, FGSM, and C＆amp;W.By introducing deep convolutional generative adversarial network (DCGAN), an adversarial sample generation algorithm named VFL-GASG was designed to address the problem of universality in the generation of adversarial perturbations.Hidden layer vectors were utilized as local prior knowledge to train the adversarial perturbation generation model, and through a series of convolution-deconvolution network layers, finely crafted adversarial perturbations were produced.Experiments show that VFL-GASG can maintain a high attack success while achieving a higher generation efficiency, robustness, and generalization ability than the baseline algorithm, and further verify the impact of relevant settings for adversarial attacks.

Key words: machine learning, VFL, adversarial sample, adversarial attack, DCGAN

中图分类号:

TP309.2

陈晓霖, 昝道广, 吴炳潮, 关贝, 王永吉. 面向纵向联邦学习的对抗样本生成算法[J]. 通信学报, 2023, 44(8): 1-13.

Xiaolin CHEN, Daoguang ZAN, Bingchao WU, Bei GUAN, Yongji WANG. Adversarial sample generation algorithm for vertical federated learning[J]. Journal on Communications, 2023, 44(8): 1-13.

图/表 17

图1

图2

表1

相关参数"

参数	含义
N	纵向联邦学习诚实参与方数量
S	中央服务器
C_i	第i个参与方
$G (\cdot)$	中央服务器全局模型
$f_{i} (\cdot)$	第i个参与方的本地模型
$f_{hon} (\cdot)$	诚实参与方本地模型
$f_{adv} (\cdot)$	恶意参与方本地模型
$X_{i}$	第i个参与方持有的本地数据
$X_{h o n}$	诚实参与方持有的本地数据
$X_{a d v}^{t}$	恶意参与方第t次迭代后的对抗样本
$X_{a d v}$	恶意参与方持有的本地数据
$d_{i}$	第i个参与方的本地数据维度
$d_{hon}$	诚实参与方的本地数据维度
$d_{adv}$	恶意参与方的本地数据维度
$r$	恶意参与方添加噪声向量
$r^{t}$	恶意参与方第t次迭代后的噪声向量
$l$	目标标签
$ϵ$	扰动超参数，用于控制扰动大小
$G$	生成对抗网络生成器
$D$	生成对抗网络判别器
$l_{adv}$	目标模型分类损失
$l_{GAN}$	对抗网络损失

表1

图3

表2

纵向联邦学习对抗样本生成算法"

生成算法	目标函数	回传参数	更新方式
VFL-LBFGS	$c \| r \| + l (x_{hon}, x_{adv}^{t}, l)$	$\nabla x_{adv} l (x_{hon}, x_{adv}^{t}, l)$	$x_{adv}^{t + 1} = x_{adv}^{t} + α^{t} (β^{t})^{- 1} (c \frac{r}{\| r \|} + \nabla x_{adv} l (x_{hon}, x_{adv}^{t}, l))$
VFL-FGSM	$l (x_{hon}, x_{adv}, y)$	$\nabla x_{adv} l (x_{hon}, x_{adv}, l)$	$x_{adv} = x + \nabla x_{adv} (x_{hon}, x_{adv}, y)$
VFL-IFGSM	$l (x_{hon}, x_{adv}^{t}, y)$	$\nabla x_{adv}^{t} l (x_{hon}, x_{adv}^{t}, y)$	$x_{adv}^{t + 1} = x_{adv}^{t} + \nabla x_{adv} l (x_{hon}, x_{adv}^{t}, y)$
VFL-MIFGSM	$l (x_{hon}, x_{adv}^{t}, y)$	$g^{t} = \nabla x_{adv}^{t} l (x_{hon}, x_{adv}^{t}, y)$	$r^{t + 1} = μ r^{t} + \frac{g^{t}}{‖ g^{t} ‖}, x_{adv}^{t + 1} = x_{adv}^{t} + sign (r^{t + 1})$
VFL-C＆W	$l_{CW} {‖ x_{adv}^{t} - x ‖}_{2}^{2} + c f (x_{adv}^{t})$	$\nabla_{w} l_{CW}$	$w = w + \nabla_{w} l_{CW}, x_{adv}^{t + 1} = \frac{\tanh (w) + 1}{2}$
VFL-JSMA	—	$(p_{1}, p_{2}) = \arg \max_{i} S {[x_{adv}, l]}_{i}$	$x_{adv, p_{1}} = x_{adv, p_{1}} + \in$
			$x_{adv, p_{2}} = x_{adv, p_{2}} + \in$

表2

图4

图5

图6

图7

图8

图9

表3

不同对抗样本生成算法在目标模型上的分类准确率"

对抗样本生成算法	MNIST		CIFAR-10		ImageNet-100
对抗样本生成算法	Top1	Top3	Top1	Top3	Top1	Top3
VFL-FGSM	92.13%	99.26%	48.22%	84.91%	49.60%	69.93%
VFL-IFGSM	85.42%	98.72%	41.25%	82.86%	40.42%	67.46%
VFL-MIFGSM	80.32%	91.51%	43.04%	89.45%	37.51%	53.42%
VFL-LBFGS	82.29%	95.86%	45.89%	82.54%	49.65%	63.73%
VFL-JMSA	87.00%	97.72%	57.65%	89.65%	37.93%	51.98%
VFL-C＆W	41.24%	$55 . 45 %$	$20 . 45 %$	76.37%	$19 . 14 %$	$40 . 62 %$
VFL-GASG	$31 . 24 %$	92.89%	43.00%	$75 . 21 %$	22.36%	47.41%
目标模型	97.27%	99.78%	79.91%	95.36%	65.09%	75.74%

表3

表4

不同对抗样本生成算法的计算耗时和鲁棒性对比"

对抗样本生成算法		MNIST			CIFAR-10			ImageNet-100
对抗样本生成算法	计算耗时/s	目标模型1	模型A	计算耗时/s	目标模型2	模型B	计算耗时/s	目标模型3	模型C
VFL-FGSM	19.64	92.13%	94.25%	12.36	48.22%	61.34%	197.92	49.60%	56.34%
VFL-IFGSM	56.23	85.42%	91.83%	41.25	41.25%	59.84%	471.50	40.42%	55.41%
VFL-MIFGSM	57.75	80.32%	86.26%	51.89	43.04%	67.63%	563.23	37.51%	49.92%
VFL-LBFGS	935.92	82.29%	89.24%	269.93	45.89%	60.93%	2669.86	49.65%	52.53%
VFL-JSMA	1771.82	87.00%	91.52%	1524.34	57.65 %	68.42%	6404.60	37.93%	51.69%
VFL-C＆W	9360.65	41.24%	59.25%	8248.43	$20 . 45 %$	$51 . 14 %$	33849.15	$19 . 14 %$	42.82%
VFL-GASG	$5 . 51$	$31 . 24 %$	$52 . 21 %$	$9 . 99$	43.00%	58.57%	$167 . 89$	22.36%	$38 . 84 %$

表4

表5

图10

图11

图12

参考文献 33

[1]	JOHN R , DAVID R , JOHN G . Data age 2025:the digitization of the world from edge to core[R]. 2018.
[2]	VOIGT P , BUSSCHE A V D . The EU general data protection regulation (GDPR)[R]. 2017.
[3]	PIPER D L A . Data protection laws of the world:full handbook[R]. 2017.
[4]	第十三届全国人民代表大会. 中华人民共和国数据安全法[Z]. 2021.
	The 13th National People’s Congress. Data security law of the People’s Republic of China[Z]. 2021.
[5]	MCMAHAN H B , MOORE E , RAMAGE D ,et al. Communication-efficient learning of deep networks from decentralized data[J]. arXiv Preprint,arXiv:1602.05629, 2016.
[6]	YANG Q , LIU Y , CHEN T ,et al. Federated machine learning:concept and applications[J]. ACM Transactions on Intelligent Systems and Technology, 2019,10(2): 1-19.
[7]	WANG G . Interpret federated learning with shapley values[J]. arXiv Preprint,arXiv:1905.04519, 2019.
[8]	CAI F . ByteDance breaks federal learning:open source fedlearner framework,209% increase in advertising efficiency[R]. 2020.
[9]	GE N , LI G H , ZHANG L ,et al. Failure prediction in production line based on federated learning:an empirical study[J]. Journal of Intelligent Manufacturing, 2022,33(8): 2277-2294.
[10]	LIU H , ZHANG X , SHEN X ,et al. A federated learning framework for smart grids:securing power traces in collaborative learning[J]. arXiv Preprint,arXiv:2103.11870, 2021.
[11]	ZHU L , LIU Z , HAN S . Deep leakage from gradients[C]// Proceedings of the 33rd International Conference on Neural Information Processing Systems. Piscataway:IEEE Press, 2019: 14774-14784.
[12]	WENG H , ZHANG J , XUE F ,et al. Privacy leakage of real-world vertical federated learning[J]. arXiv Preprint,arXiv:2011.09290, 2020.
[13]	FU C , ZHANG X , JI S ,et al. Label inference attacks against vertical federated learning[C]// 31st USENIX Security Symposium. Berkeley:USENIX Association, 2022: 1397-1414.
[14]	LUO X J , WU Y C , XIAO X K ,et al. Feature inference attack on model predictions in vertical federated learning[C]// Proceedings of 2021 IEEE 37th International Conference on Data Engineering (ICDE). Piscataway:IEEE Press, 2021: 181-192.
[15]	JIN X , CHEN P Y , HSU C Y ,et al. CAFE:catastrophic data leakage in vertical federated learning[J]. arXiv Preprint,arXiv:2110.15122, 2021.
[16]	YANG R K , MA J F , ZHANG J Y ,et al. Practical feature inference attack in vertical federated learning during prediction in artificial In-ternet of things[J]. IEEE Internet of Things Journal, 2023:doi.10.1109/JIOT.2023.3275161.
[17]	ZHANG C , LI S , XIA J ,et al. Batchcrypt:efficient homomorphic encryption for cross-silo federated learning[C]// Proceedings of the 2020 USENIX Annual Technical Conference. Berkeley:USENIX Association, 2020
[18]	LIU Y , ZHANG X W , KANG Y ,et al. FedBCD:a communication-efficient collaborative learning framework for distributed features[J]. IEEE Transactions on Signal Processing, 2022,70: 4277-4290.
[19]	SZEGEDY C , ZAREMBA W , SUTSKEVER I ,et al. Intriguing properties of neural networks[J]. arXiv Preprint,arXiv:1312.6199, 2013.
[20]	GOODFELLOW I J , SHLENS J , SZEGEDY C . Explaining and harnessing adversarial examples[J]. arXiv Preprint,arXiv:1412.6572, 2014.
[21]	CHENG K W , FAN T , JIN Y L ,et al. SecureBoost:a lossless federated learning framework[J]. IEEE Intelligent Systems, 2021,36(6): 87-98.
[22]	NI X , XU X , LYU L ,et al. A vertical federated learning framework for graph convolutional network[J]. arXiv Preprint,arXiv:2106.11593, 2021.
[23]	CEBALLOS I , SHARMA V , MUGICA E ,et al. SplitNN-driven vertical partitioning[J]. arXiv Preprint,arXiv:2008.04137, 2020.
[24]	陈晋音, 李荣昌, 黄国瀚 ,等. 纵向联邦学习方法及其隐私和安全综述[J]. 网络与信息安全学报, 2023,9(2): 1-20.
	CHEN J Y , LI R C , HUANG G H ,et al. Survey on vertical federated learning:algorithm,privacy and security[J]. Chinese Journal of Network and Information Security, 2023,9(2): 1-20.
[25]	王波, 代晓蕊, 王伟 ,等. 面向联邦学习的对抗样本投毒攻击[J]. 中国科学(信息科学), 2023,53(3): 470-484.
	WANG B , DAI X R , WANG W ,et al. Adversarial examples for poisoning attacks against federated learning[J]. Scientia Sinica (Informationis), 2023,53(3): 470-484.
[26]	冯霁, 蔡其志, 姜远 . 联邦学习下对抗训练样本表示的研究[J]. 中国科学:信息科学, 2021,51(6): 900-911.
	FENG J , CAI Q Z , JIANG Y . Towards training time attacks for federated machine learning systems[J]. Scientia Sinica (Informationis), 2021,51(6): 900-911.
[27]	CARLINI N , WAGNER D . Towards evaluating the robustness of neural networks[C]// Proceedings of 2017 IEEE Symposium on Security and Privacy (SP). Piscataway:IEEE Press, 2017: 39-57.
[28]	KURAKIN A , GOODFELLOW I J , BENGIO S . Artificial intelligence safety and security[M]. Boca Raton: CRC Press, 2018.
[29]	DONG Y P , LIAO F Z , PANG T Y ,et al. Boosting adversarial attacks with momentum[C]// Proceedings of 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Piscataway:IEEE Press, 2018: 9185-9193.
[30]	PAPERNOT N , MCDANIEL P , JHA S ,et al. The limitations of deep learning in adversarial settings[C]// Proceedings of 2016 IEEE European Symposium on Security and Privacy (EuroS＆P). Piscataway:IEEE Press, 2016: 372-387.
[31]	GOODFELLOW I , POUGET-ABADIE J , MIRZA M ,et al. Generative adversarial networks[J]. Communications of the ACM, 2020,63(11): 139-144.
[32]	ARJOVSKY M , CHINTALA S , BOTTOU L . Wasserstein GAN[J]. arXiv Preprint,arXiv:1701.07875, 2017.
[33]	RADFORD A , METZ L , CHINTALA S . Unsupervised representation learning with deep convolutional generative adversarial networks[J]. arXiv Preprint,arXiv:1511.06434, 2015.

训练样本比例	MNIST		CIFAR-10		ImageNet-100
训练样本比例	Top1	Top3	Top1	Top3	Top1	Top3
5%	33.16%	92.13%	42.90%	74.39%	23.87%	48.27%
10%	30.90%	92.94%	41.79%	73.39%	24.92%	47.48%
20%	30.73%	92.50%	43.51%	76.01%	25.83%	49.45%
40%	31.24%	92.86%	43.48%	76.65%	23.65%	48.18%
80%	30.96%	92.88%	42.14%	75.05%	22.83%	48.32%
100%	31.24%	92.89%	43.00%	75.21%	22.36%	47.41%
极差	2.43%	0.81%	1.72%	3.26%	3.47%	2.04%

面向纵向联邦学习的对抗样本生成算法

Adversarial sample generation algorithm for vertical federated learning

在线阅读

PDF下载

可视化

摘要/Abstract

引用本文

使用本文

图/表 17

参考文献 33

相关文章 15

Metrics

推荐阅读 0

[1]	巩小雪, 庞嘉豪, 张琦涵, 徐长乐, 秦文帅, 郭磊. 基于机器学习的光网络干扰攻击检测、识别与恢复方法[J]. 通信学报, 2023, 44(7): 159-170.
[2]	张佳乐, 朱诚诚, 孙小兵, 陈兵. 基于GAN的联邦学习成员推理攻击与防御方法[J]. 通信学报, 2023, 44(5): 193-205.
[3]	戴千一, 张斌, 郭松, 徐开勇. 基于多分类器集成的区块链网络层异常流量检测方法[J]. 通信学报, 2023, 44(3): 66-80.
[4]	袁程胜, 郭强, 付章杰. 基于差分隐私的深度伪造指纹检测模型版权保护算法[J]. 通信学报, 2022, 43(9): 181-193.
[5]	何高峰, 魏千峰, 肖咸财, 朱海婷, 徐丙凤. 支持数据隐私保护的恶意加密流量检测确认方法[J]. 通信学报, 2022, 43(2): 156-170.
[6]	陆彦辉, 柳寒, 李航, 朱光旭. 基于多鉴别器生成对抗网络的时间序列生成模型[J]. 通信学报, 2022, 43(10): 167-176.
[7]	冯智斌, 徐煜华, 杜智勇, 刘鑫, 李文, 韩昊, 张晓博. 对抗智能干扰的主动防御技术[J]. 通信学报, 2022, 43(10): 42-54.
[8]	彭长根, 高婷, 刘惠篮, 丁红发. 面向机器学习模型的基于PCA的成员推理攻击[J]. 通信学报, 2022, 43(1): 149-160.
[9]	梅锴, 赵海涛, 刘潇然, 刘军, 熊俊, 任保全, 魏急波. 高效的基于数据与模型的信道估计算法[J]. 通信学报, 2022, 43(1): 59-70.
[10]	吴翼腾, 刘伟, 于洪涛. 图神经网络的标签翻转对抗攻击[J]. 通信学报, 2021, 42(9): 65-74.
[11]	邹福泰, 谭越, 王林, 蒋永康. 基于生成对抗网络的僵尸网络检测[J]. 通信学报, 2021, 42(7): 95-106.
[12]	刘留, 张建华, 樊圆圆, 于力, 张嘉驰. 机器学习在信道建模中的应用综述[J]. 通信学报, 2021, 42(2): 134-153.
[13]	程旭, 王莹莹, 张年杰, 付章杰, 陈北京, 赵国英. 基于空间感知的多级损失目标跟踪对抗攻击方法[J]. 通信学报, 2021, 42(11): 242-254.
[14]	刘奇旭, 王君楠, 尹捷, 陈艳辉, 刘嘉熹. 对抗机器学习在网络入侵检测领域的应用[J]. 通信学报, 2021, 42(11): 1-12.
[15]	伏玉笋,杨根科. 人工智能在移动通信中的应用：挑战与实践[J]. 通信学报, 2020, 41(9): 190-201.