深度学习数据窃取攻击在数据沙箱模式下的威胁分析与防御方法研究

doi:10.11959/j.issn.1000-436x.2021215

Abstract

Abstract:

The threat model of deep-learning-based data theft in data sandbox model was analyzed in detail, and the degree of damage and distinguishing characteristics of this attack were quantitatively evaluated both in the data processing stage and the model training stage.Aiming at the attack in the data processing stage, a data leakage prevention method based on model pruning was proposed to reduce the amount of data leakage while ensuring the availability of the original model.Aiming at the attack in model training stage, an attack detection method based on model parameter analysis was proposed to intercept malicious models and prevent data leakage.These two methods do not need to modify or encrypt data, and do not need to manually analyze the training code of deep learning model, so they can be better applied to data theft defense in data sandbox mode.Experimental evaluation shows that the defense method based on model pruning can reduce 73% of data leakage, and the detection method based on model parameter analysis can effectively identify more than 95% of attacks.

Key words: data sandbox, data theft, security of AI

CLC Number:

TP309.2

Hezhong PAN, Peiyi HAN, Xiayu XIANG, Shaoming DUAN, Rongfei ZHUANG, Chuanyi LIU. Threat analysis and defense methods of deep-learning-based data theft in data sandbox mode[J]. Journal on Communications, 2021, 42(11): 133-144.

Figures/Tables 11

References 25

[25]	HAN P Y , LIU C Y , WANG J H ,et al. Research on data encryption system and technology for cloud storage[J]. Journal on Communica-tions, 2020,41(8): 55-65.
[26]	CAO Y Z , YANG J F . Towards making systems forget with machine unlearning[C]// Proceedings of 2015 IEEE Symposium on Security and Privacy. Piscataway:IEEE Press, 2015: 463-480.
[27]	NASR M , SHOKRI R , HOUMANSADR A . Machine learning with membership privacy using adversarial regularization[C]// Proceedings of 2018 ACM SIGSAC Conference on Computer and Communications Security. New York:ACM Press, 2018: 634-646.
[28]	张佳乐, 赵彦超, 陈兵 ,等. 边缘计算数据安全与隐私保护研究综述[J]. 通信学报, 2018,39(3): 1-21.
	ZHANG J L , ZHAO Y C , CHEN B ,et al. Survey on data security and privacy-preserving for the research of edge computing[J]. Journal on Communications, 2018,39(3): 1-21.
[29]	LIU K , DOLAN-GAVITT B , GARG S . Fine-pruning:defending against backdooring attacks on deep neural networks[C]// Research in Attacks,Intrusions,and Defenses. Cham:Springer International Publishing, 2018: 273-294.
[1]	DELACROIX S , MONTGOMERY J . From research data ethics prin-ciples to practice:data trusts as a governance tool[J]. SSRN Electronic Journal,2020:doi.org/10.2139/ssrn.3736090.
[2]	O’HARA K . Data trusts:ethics,architecture and governance for trustworthy data stewardship[R]. 2019.
[3]	CARLINI N , LIU C , ERLINGSSON ú ,et al. The secret sharer:evaluating and testing unintended memorization in neural networks[C]// Proceedings of the 28th USENIX Security Symposium. Berkeley:USENIX Association, 2019: 267-284.
[4]	CARLINI N , TRAMER F , WALLACE E ,et al. Extracting training data from large language models[C]// Proceedings of the 30th USENIX Security Symposium. Berkeley:USENIX Association, 2021: 2633-2650.
[5]	ZHANG Y H , JIA R X , PEI H Z ,et al. The secret revealer:generative model-inversion attacks against deep neural networks[C]// Proceedings of 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). Piscataway:IEEE Press, 2020: 250-258.
[6]	ZHU L G , LIU Z J , HAN S . Deep leakage from gradients[J]. arXiv Preprint,arXiv:1906.0835, 2019.
[7]	SONG C Z , RISTENPART T , SHMATIKOV V . Machine learning models that remember too much[C]// Proceedings of the 2017 ACM SIGSAC Conference on Computer and Communications Security. New York:ACM Press, 2017: 587-601.
[8]	ZHANG T W . Privacy-preserving machine learning through data obfuscation[J]. arXiv Preprint,arXiv:1807.01860, 2018.
[9]	BRAKERSKI Z , GENTRY C , VAIKUNTANATHAN V . (Leveled) fully homomorphic encryption without bootstrapping[J]. ACM Transactions on Computation Theory, 2014,6(3): 1-36.
[10]	PAILLIER P , . Public-key cryptosystems based on composite degree residuosity classes[C]// Advances in Cryptology — EUROCRYPT’99. Berlin:Springer, 1999: 223-238.
[11]	ABADI M , CHU A , GOODFELLOW I ,et al. Deep learning with differential privacy[C]// Proceedings of the 2016 ACM SIGSAC Conference on Computer and Communications Security. New York:ACM Press, 2016: 308-318.
[12]	GOLATKAR A , ACHILLE A , SOATTO S . Eternal sunshine of the spotless net:selective forgetting in deep networks[C]// Proceedings of 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). Piscataway:IEEE Press, 2020: 9304-9312.
[13]	JIA J Y , SALEM A , BACKES M ,et al. MemGuard:defending against black-box membership inference attacks via adversarial examples[C]// Proceedings of 2019 ACM SIGSAC Conference on Computer and Communications Security. New York:ACM Press, 2019: 259-274.
[14]	PAPERNOT N , ABADI M , ERLINGSSON U ,et al. Semi-supervised knowledge transfer for deep learning from private training data[J]. arXiv Preprint,arXiv:1610.05755, 2016.
[15]	FREDRIKSON M , JHA S , RISTENPART T . Model inversion attacks that exploit confidence information and basic countermeasures[C]// Proceedings of the 22nd ACM SIGSAC Conference on Computer and Communications Security. New York:ACM Press, 2015: 1322-1333.
[16]	HITAJ B , ATENIESE G , PEREZ-CRUZ F , . Deep models under the GAN:information leakage from collaborative deep learning[C]// Proceedings of 2017 ACM SIGSAC Conference on Computer and Communications Security. New York:ACM Press, 2017: 603-618.
[17]	PAN X D , ZHANG M , JI S L ,et al. Privacy risks of general-purpose language models[C]// Proceedings of 2020 IEEE Symposium on Security and Privacy (SP). Piscataway:IEEE Press, 2020: 1314-1331.
[18]	SALEM A , BHATTACHARYA A , BACKES M ,et al. Updates-leak:data set inference and reconstruction attacks in online learning[C]// Proceedings of the 29th USENIX Security Symposium. Berkeley:USENIX Association, 2020: 1291-1308.
[19]	WANG Z B , SONG M K , ZHANG Z F ,et al. Beyond inferring class representatives:user-level privacy leakage from federated learning[C]// Proceedings of IEEE INFOCOM 2019 - IEEE Conference on Computer Communications. Piscataway:IEEE Press, 2019: 2512-2520.
[20]	杨攀, 桂小林, 姚婧 ,等. 支持同态算术运算的数据加密方案算法研究[J]. 通信学报, 2015,36(1): 171-182.
	YANG P , GUI X L , YAO J ,et al. Research on algorithms of data en-cryption scheme that supports homomorphic arithmetical operations[J]. Journal on Communications, 2015,36(1): 171-182.
[21]	闫玺玺, 原笑含, 汤永利 ,等. 基于区块链且支持验证的属性基搜索加密方案[J]. 通信学报, 2020,41(2): 187-198.
	YAN X X , YUAN X H , TANG Y L ,et al. Verifiable attribute-based searchable encryption scheme based on blockchain[J]. Journal on Communications, 2020,41(2): 187-198.
[22]	ZHANG Q C , YANG L T , CHEN Z K . Privacy preserving deep computation model on cloud for big data feature learning[J]. IEEE Transactions on Computers, 2016,65(5): 1351-1362.
[23]	RAHULAMATHAVAN Y , PHAN R C W , VELURU S ,et al. Privacy-preserving multi-class support vector machine for outsourcing the data classification in cloud[J]. IEEE Transactions on Dependable and Secure Computing, 2014,11(5): 467-479.
[24]	于东, 康海燕 . 面向时序数据发布的隐私保护方法研究[J]. 通信学报, 2015,36(S1): 243-249.
	YU D , KANG H Y . Privacy protection method on time-series data publication[J]. Journal on Communications, 2015,36(S1): 243-249.
[25]	韩培义, 刘川意, 王佳慧 ,等. 面向云存储的数据加密系统与技术研究[J]. 通信学报, 2020,41(8): 55-65.

Metrics

Recommended 0

No Suggested Reading articles found!

攻击方式	数据集	AI模型	原始模型效果		攻击危害程度
攻击方式	数据集	AI模型	准确率	宏平均F1值	参数α	相关系数P	符号相关度Q
			0.881	0.881	4.0	0.960	—
		ResNet34	0.861	0.861	8.0	0.980	—
	CIFAR10		0.839	0.839	16.0	0.990	—
			0.932	0.932	4.0	0.969	—
		PreActResNet18	0.924	0.924	8.0	0.984	—
模型训练阶段攻击1			0.901	0.901	16.0	0.991	—
			0.908	0.908	4.0	0.831	—
		ResNet34	0.917	0.917	8.0	0.961	—
	Olivetti		0.883	0.883	16.0	0.992	—
			0.900	0.900	4.0	0.993	—
		PreActResNet18	0.867	0.867	8.0	0.999	—
			0.881	0.881	16.0	0.960	—
			0.926	0.926	8.0	—	0.845
		ResNet34	0.922	0.922	16.0	—	0.949
	CIFAR10		0.917	0.917	32.0	—	0.984
			0.941	0.941	8.0	—	0.764
		PreActResNet18	0.946	0.946	16.0	—	0.853
模型训练阶段攻击2			0.946	0.946	32.0	—	0.910
			0.900	0.900	8.0	—	0.536
		ResNet34	0.917	0.917	16.0	—	0.579
	Olivetti		0.917	0.917	32.0	—	0.696
			0.527	0.891	8.0	—	0.764
		PreActResNet18	0.908	0.908	16.0	—	0.561
			0.892	0.892	32.0	—	0.632

攻击方式	数据集	AI模型	原始模型效果		攻击危害程度
攻击方式	数据集	AI模型	准确率	宏平均F1值	合成数据量	准确率	宏平均F1值
			0.924	0.924	8 192	0.991	0.944
		ResNet34	0.920	0.919	18 432	0.999	0.945
	CIFAR10		0.913	0.913	38 912	0.999	0.938
			0.945	0.983	8 192	0.999	0.999 4
		PreActResNet18	0.945	0.998	18 432	1.0	1.0
数据准备阶段攻击			0.938	0.999	38 912	1.0	1.0
			0.892	0.892	2 048	1.0	1.0
		ResNet34	0.916	0.916	4 096	0.999	0.999
	Olivetti		0.933	0.933	8 192	0.999	0.999
			0.966	0.966	2 048	0.989	0.989
		PreActResNet18	0.900	0.900	4 096	0.998	0.998
			0.891	0.891	8 192	0.999	0.999

使用模型	识别准确率	宏平均F1值
随机森林模型	0.934	0.934
逻辑回归模型	0.934	0.934
SVM	0.953	0.953
Adaboost	0.934	0.934
XGBoost	0.915	0.915

相关工作	能否检测恶意模型	是否修改原始数据	是否修改训练代码	直接修改模型参数
文献[18]	×	×	√	×
文献[8-10,22-23]	×	√	×	×
文献[11-14]	×	×	×	√
文献[26-27]	×	×	√	×
本文工作	√	×	×	√

Threat analysis and defense methods of deep-learning-based data theft in data sandbox mode

RichHTML

PDF

Knowledge

Abstract

Cite this article

share this article

Figures/Tables 11

References 25

Related Articles 15

Metrics

Recommended 0

[1]	Ming TANG, Yifan HU. Load-to-store: exploit the time leakage of store buffer transient window [J]. Journal on Communications, 2023, 44(4): 64-77.
[2]	Baiji HU, Xiaojuan ZHANG, Yuancheng LI, Rongxin LAI. Multi-function supported privacy protection data aggregation scheme for V2G network [J]. Journal on Communications, 2023, 44(4): 187-200.
[3]	Dongyan HUANG, Kun LI. Research on multi-address time-based blockchain covert communication method [J]. Journal on Communications, 2023, 44(2): 148-159.
[4]	Shufen ZHANG, Yanling DONG, Jingcheng XU, Haoshi WANG. AdaBoost algorithm based on target perturbation [J]. Journal on Communications, 2023, 44(2): 198-209.
[5]	Dongmei YANG, Yue CHEN, Jianghong WEI, Xuexian HU. Identity-based puncturable signature scheme [J]. Journal on Communications, 2021, 42(12): 17-26.
[6]	Rui SHI, Huamin FENG, Huiqin XIE, Guozhen SHI, Biao LIU, Yang YANG. Privacy-preserving attribute ticket scheme based on mobile terminal with smart card [J]. Journal on Communications, 2022, 43(10): 26-41.
[7]	Haiyan KANG, Yuanrui JI. Research on federated learning approach based on local differential privacy [J]. Journal on Communications, 2022, 43(10): 94-105.
[8]	Wei SHE, Xinpeng RONG, Wei LIU, Zhao TIAN. Generative blockchain-based covert communication model based on Markov chain [J]. Journal on Communications, 2022, 43(10): 121-132.
[9]	Leixiao LI, Jinze DU, Hao LIN, Haoyu GAO, Yanyan YANG, Jing GAO. Research progress of blockchain network covert channel [J]. Journal on Communications, 2022, 43(9): 209-223.
[10]	Guanxiong HA, Qiaowen JIA, Hang CHEN, Chunfu JIA. Data popularity-based encrypted deduplication scheme without third-party servers [J]. Journal on Communications, 2022, 43(8): 17-29.
[11]	Xiayu XIANG, Jiahui WANG, Zirui WANG, Shaoming DUAN, Hezhong PAN, Rongfei ZHUANG, Peiyi HAN, Chuanyi LIU. Generate medical synthetic data based on generative adversarial network [J]. Journal on Communications, 2022, 43(3): 211-224.
[12]	Huamin FENG, Rui SHI, Feng YUAN, Yanjun LI, Yang YANG. Efficient strong privacy protection and transferable attribute-based ticket scheme [J]. Journal on Communications, 2022, 43(3): 63-75.
[13]	Hongxia ZHANG, Qi WANG, Dengyue WANG, Ben WANG. Honeypot contract detection of blockchain based on deep learning [J]. Journal on Communications, 2022, 43(1): 194-202.
[14]	Chunfu JIA, Guanxiong HA, Shaoqiang WU, Hang CHEN, Ruiqi LI. AONT-and-NTRU-based rekeying scheme for encrypted deduplication [J]. Journal on Communications, 2021, 42(10): 67-80.
[15]	Xiaoyuan YANG, Xinliang BI, Jia LIU, Siyuan HUANG. High-capacity image steganography algorithm combining image encryption and deep learning [J]. Journal on Communications, 2021, 42(9): 96-105.