通信学报 ›› 2024, Vol. 45 ›› Issue (1): 201-213.doi: 10.11959/j.issn.1000-436x.2024004

• 学术通信 • 上一篇    

K-Modes聚类数据收集和发布过程中的混洗差分隐私保护方法

蒋伟进1,2,3, 陈艺琳1,3, 韩裕清1,3, 吴玉庭1,3, 周为1,3, 王海娟3,4   

  1. 1 湖南工商大学计算机学院,湖南 长沙 410205
    2 武汉理工大学计算机与人工智能学院,湖北 武汉 430070
    3 湘江实验室,湖南 长沙 410205
    4 湖南工商大学前沿交叉学院,湖南 长沙 410205
  • 修回日期:2023-11-08 出版日期:2024-01-01 发布日期:2024-01-01
  • 作者简介:蒋伟进(1964- ),男,湖南益阳人,博士,湖南工商大学教授、硕士生导师,主要研究方向为信息安全、网络安全和群智感知
    陈艺琳(2000- ),女,河南许昌人,湖南工商大学硕士生,主要研究方向为信息安全和差分隐私
    韩裕清(2000- ),男,湖南长沙人,湖南工商大学硕士生,主要研究方向为信息安全和联邦学习
    吴玉庭(1998- ),女,湖南益阳人,湖南工商大学硕士生,主要研究方向为信息安全和群智感知
    周为(2000- ),男,湖南益阳人,湖南工商大学硕士生,主要研究方向为信息安全和群智感知
    王海娟(2000- ),女,江西九江人,湖南工商大学硕士生,主要研究方向为差分隐私和群智感知
  • 基金资助:
    国家自然科学基金资助项目(72088101);国家自然科学基金资助项目(61772196);湖南省自然科学基金重点资助项目(2020JJ4249);新零售虚拟现实技术湖南省重点实验室基金资助项目(2017TP1026);湖南省教育厅科学研究重点基金资助项目(21A0374);湖南省学位与研究生教学改革基金资助项目(2022JGYB194)

Shuffled differential privacy protection method for K-Modes clustering data collection and publication

Weijin JIANG1,2,3, Yilin CHEN1,3, Yuqing HAN1,3, Yuting WU1,3, Wei ZHOU1,3, Haijuan WANG3,4   

  1. 1 School of Computer Science, Hunan University of Technology and Business, Changsha 410205, China
    2 School of Computer Science and Artificial Intelligence, Wuhan University of Technology, Wuhan 430070, China
    3 Xiangjiang Laboratory, Changsha 410205, China
    4 College of Frontier Intersection, Hunan University of Technology and Business, Changsha 410205, China
  • Revised:2023-11-08 Online:2024-01-01 Published:2024-01-01
  • Supported by:
    The National Natural Science Foundation of China(72088101);The National Natural Science Foundation of China(61772196);The Key Project of the Natural Science Foundation of Hunan Province(2020JJ4249);Key Laboratory of Hunan Province for New Retail Virtual Reality Technology(2017TP1026);Key Scientific Research Project of Hunan Provincial Department of Education(21A0374);Hunan Provincial Degree and Graduate Teaching Reform Project(2022JGYB194)

摘要:

针对目前聚类数据收集与发布安全性不足的问题,为保护聚类数据中的用户隐私并提高数据质量,基于混洗差分隐私模型,提出一种去可信第三方的K-Modes聚类数据收集和发布的隐私保护方法。首先,使用K-Modes聚类数据收集算法对用户数据进行采样并加噪,再通过填补取值域随机排列发布算法打乱采样数据的初始顺序,使恶意攻击者不能根据用户与数据之间的关系识别出目标用户。然后,尽可能减小噪声的干扰,利用循环迭代的方式计算出新的质心完成聚类。最后,从理论层面上分析了以上3种方法的隐私性、可行性和复杂度,并利用3个真实数据集和近年来具有权威性的同类算法 KM、DPLM、LDPKM 等进行准确率、熵值的对比,验证所提方法的有效性。实验结果表明,所提方法的隐私保护和发布数据质量均优于当前同类算法。

关键词: 混洗差分隐私, K-Modes聚类, 隐私保护, 数据收集, 数据发布

Abstract:

Aiming at the current problem of insufficient security in clustering data collection and publication, in order to protect user privacy and improve data quality in clustering data, a privacy protection method for K-Modes clustering data collection and publication was proposed without trusted third parties based on the shuffled differential privacy model.K-Modes clustering data collection algorithm was used to sample the user data and add noise, and then the initial order of the sampled data was disturbed by filling in the value domain random arrangement publishing algorithm.The malicious attacker couldn’t identify the target user according to the relationship between the user and the data, and then to reduce the interference of noise as much as possible a new centroid was calculated by cyclic iteration to complete the clustering.Finally, the privacy, feasibility and complexity of the above three methods were analyzed from the theoretical level, and the accuracy and entropy of the three real data sets were compared with the authoritative similar algorithms KM, DPLM and LDPKM in recent years to verify the effectiveness of the proposed model.The experimental results show that the privacy protection and data quality of the proposed method are superior to the current similar algorithms.

Key words: shuffled differential privacy, K-Modes clustering, privacy protection, data collection, data publication

中图分类号: 

No Suggested Reading articles found!