网络与信息安全学报 ›› 2020, Vol. 6 ›› Issue (3): 39-49.doi: 10.11959/j.issn.2096-109x.2020035

• 专栏:隐私保护新技术探索 • 上一篇    下一篇

基于差分隐私的非等距直方图发布方法

杨磊1,2,郑啸1,2,赵伟1,2   

  1. 1 安徽工业大学计算机科学与技术学院 安徽 马鞍山 243032
    2 安徽省工业互联网智能应用与安全工程实验室 安徽 马鞍山 243032
  • 修回日期:2019-12-23 出版日期:2020-06-01 发布日期:2020-07-01
  • 作者简介:杨磊(1995- ),女,安徽池州人,安徽工业大学硕士生,主要研究方向为差分隐私与大数据|郑啸(1975- ),男,福建莆田人,博士,安徽工业大学教授、硕士生导师,主要研究方向为计算机网络、服务计算与云计算、安全与隐私保护|赵伟(1988- ),男,安徽六安人,博士,安徽工业大学副教授、硕士生导师,主要研究方向为无线网状网络、移动自组织网络、边缘计算
  • 基金资助:
    安徽省重点研究与开发计划基金(201904a05020071)

Non-equal-width histogram publishing method based on differential privacy

Lei YANG1,2,Xiao ZHENG1,2,Wei ZHAO1,2   

  1. 1 School of Computer Science and Technology,Anhui University of Technology,Maanshan 243032,China
    2 Anhui Engineering Laboratory for Intelligent Applications and Security of Industrial Internet,Maanshan 243032,China
  • Revised:2019-12-23 Online:2020-06-01 Published:2020-07-01
  • Supported by:
    The Key R & D Program of Anhui Province,China(201904a05020071)

摘要:

已有的基于差分隐私的直方图发布技术在利用直方图反映数据的真实分布特征时可能会出现“重拖尾”和“零桶”现象,并且在数据量较多处“过于平缓”;另外,已有技术对原始直方图进行差分隐私保护时未考虑每个分组所蕴含的信息量大小不同。针对以上问题,提出一种基于差分隐私的非等距直方图发布方法。首先,利用经验分布函数根据数据稀疏性合理构建非等距直方图;然后,在非等距直方图上应用差分隐私保护技术对原始非等距直方图进行隐私保护;最后,根据非等距直方图的组距大小为每组设置隐私预算以提高每组数据的隐私性。实验结果表明,所提方法在差分隐私下进行直方图发布时充分考虑了数据分布的稀疏性,有效避免了直方图的“重拖尾”和“零桶”现象,保证了所发布直方图反映数据分布特征的准确性;并且为每组添加符合拉普拉斯(Laplace)机制的噪声时,根据组距为每组设置合理的隐私预算,在一定程度上提高了不同数据段的隐私性。

关键词: 差分隐私, 非等距, 直方图发布, 拉普拉斯机制, 隐私预算

Abstract:

Existing histogram publishing technology based on differential privacy may show phenomenon of"retracting" and "zero bucket" when histogram is used to reflect the real distribution characteristics of data,and "too gentle" in the case of large data volume.In addition,the existing technology of the original histogram difference of privacy protection when not considering the amount of information of each group is different.In view of the above problems,a kind of non-equal-width histogram publishing method based on differential privacy was proposed.First of all,a non-isometric histogram based on the sparseness of the data should bereasonably constructed by empirical distribution function.Secondly,differential privacy protection technology should be applied to non-equal-width histogram to protect the privacy of the original non-equal-width histogram.Finally,the privacy budget should be set for each group according to the class widths of the non-equal-width histogram to improve the privacy of each group of data.The experimental results show that the sparseness of the data distribution is fully taken into account when using the proposed method to perform histogram publishing under differential privacy,effectively avoid the phenomenon of histogram with “retracting” and “zero barrels”,and the accuracy of the published histogram for reflecting the characteristics of the data distribution is guaranteed.Also,when adding noise in line with Laplace mechanism to each group,setting a reasonable privacy budget for each group according to the class widths to some extent increases the privacy of different data segments.

Key words: differential privacy, non-equal-width, histogram publishing, Laplace mechanism, privacy budget

中图分类号: 

No Suggested Reading articles found!