网络与信息安全学报 ›› 2020, Vol. 6 ›› Issue (4): 109-119.doi: 10.11959/j.issn.2096-109x.2020046

• 学术论文 • 上一篇    

基于生成对抗网络的文本序列数据集脱敏

张煜,吕锡香(),邹宇聪,李一戈   

  1. 西安电子科技大学网络与信息安全学院,陕西 西安 710071
  • 修回日期:2020-04-09 出版日期:2020-08-01 发布日期:2020-08-13
  • 作者简介:张煜(1995- ),男,陕西延安人,西安电子科技大学硕士生,主要研究方向为隐私保护和机器学习|吕锡香(1978- ),女,陕西洛南人,西安电子科技大学教授、博士生导师,主要研究方向为网络与协议安全、机器学习与安全、密码算法与协议|邹宇聪(1999- ),男,湖南桃江人,主要研究方向为隐私保护和机器学习|李一戈(1995- ),男,陕西洛南人,西安电子科技大学博士生,主要研究方向为机器学习与安全
  • 基金资助:
    信息保障重点实验室基金(KJ-17-108);陕西省重点研发计划(2019ZDLGY12-08);国家重点研发计划(2018YFB0804105)

Differentially private sequence generative adversarial networks for data privacy masking

Yu ZHANG,Xixiang LYU(),Yucong ZOU,Yige LI   

  1. School of Cyber Engineering,Xidian University,Xi’an 710071,China
  • Revised:2020-04-09 Online:2020-08-01 Published:2020-08-13
  • Supported by:
    The Foundation of Science and Technology on Information Assurance Laboratory(KJ-17-108);The Key Research and Development Project of Shaanxi Province,China(2019ZDLGY12-08);The National Key R&D Program of China(2018YFB0804105)

摘要:

基于生成对抗网络和差分隐私提出一种文本序列数据集脱敏模型,即差分隐私文本序列生成网络(DP-SeqGAN)。DP-SeqGAN通过生成对抗网络自动提取数据集的重要特征并生成与原数据分布接近的新数据集,基于差分隐私对模型做随机加扰以提高生成数据集的隐私性,并进一步降低鉴别器过拟合。DP-SeqGAN 具有直观通用性,无须对具体数据集设计针对性脱敏规则和对模型做适应性调整。实验表明,数据集经DP-SeqGAN脱敏后其隐私性和可用性明显提升,成员推断攻击成功率明显降低。

关键词: 隐私保护, 数据脱敏, 生成对抗网络, 差分隐私

Abstract:

Based on generative adversary networks and the differential privacy mechanism,a differentially private sequence generative adversarial net (DP-SeqGAN) was proposed,with which the privacy of text sequence data sets can be filtered out.DP-SeqGAN can be used to automatically extract important features of a data set and then generate a new data set which was close to the original one in terms of data distributions.Based on differential privacy,randomness is introduced to the model,which improves the privacy of the generated data set and further reduces the over fitting of the discriminator.The proposed DP-SeqGAN was universal,so there is no need to adjust the model adaptively for datasets or design complex masking rules against dataset characters.The experiments show that the privacy and usability of a sequence data set are both improved significantly after it is processed by the DP-SeqGAN model,and DP-SeqGAN can greatly reduce the success rate of member inference attacks against the generated data set.

Key words: privacy preserving, data privacy masking, generative adversarial network, differential privacy

中图分类号: 

No Suggested Reading articles found!