Journal on Communications ›› 2024, Vol. 45 ›› Issue (1): 201-213.doi: 10.11959/j.issn.1000-436x.2024004

• Correspondences • Previous Articles    

Shuffled differential privacy protection method for K-Modes clustering data collection and publication

Weijin JIANG1,2,3, Yilin CHEN1,3, Yuqing HAN1,3, Yuting WU1,3, Wei ZHOU1,3, Haijuan WANG3,4   

  1. 1 School of Computer Science, Hunan University of Technology and Business, Changsha 410205, China
    2 School of Computer Science and Artificial Intelligence, Wuhan University of Technology, Wuhan 430070, China
    3 Xiangjiang Laboratory, Changsha 410205, China
    4 College of Frontier Intersection, Hunan University of Technology and Business, Changsha 410205, China
  • Revised:2023-11-08 Online:2024-01-01 Published:2024-01-01
  • Supported by:
    The National Natural Science Foundation of China(72088101);The National Natural Science Foundation of China(61772196);The Key Project of the Natural Science Foundation of Hunan Province(2020JJ4249);Key Laboratory of Hunan Province for New Retail Virtual Reality Technology(2017TP1026);Key Scientific Research Project of Hunan Provincial Department of Education(21A0374);Hunan Provincial Degree and Graduate Teaching Reform Project(2022JGYB194)

Abstract:

Aiming at the current problem of insufficient security in clustering data collection and publication, in order to protect user privacy and improve data quality in clustering data, a privacy protection method for K-Modes clustering data collection and publication was proposed without trusted third parties based on the shuffled differential privacy model.K-Modes clustering data collection algorithm was used to sample the user data and add noise, and then the initial order of the sampled data was disturbed by filling in the value domain random arrangement publishing algorithm.The malicious attacker couldn’t identify the target user according to the relationship between the user and the data, and then to reduce the interference of noise as much as possible a new centroid was calculated by cyclic iteration to complete the clustering.Finally, the privacy, feasibility and complexity of the above three methods were analyzed from the theoretical level, and the accuracy and entropy of the three real data sets were compared with the authoritative similar algorithms KM, DPLM and LDPKM in recent years to verify the effectiveness of the proposed model.The experimental results show that the privacy protection and data quality of the proposed method are superior to the current similar algorithms.

Key words: shuffled differential privacy, K-Modes clustering, privacy protection, data collection, data publication

CLC Number: 

No Suggested Reading articles found!