Journal on Communications ›› 2016, Vol. 37 ›› Issue (2): 125-131.doi: 10.11959/j.issn.1000-436x.2016038

• academic paper • Previous Articles     Next Articles

k-means clustering method preserving differential privacy in MapReduce framework

Hong-cheng LI1,Xiao-ping WU1,Yan CHEN2   

  1. 1 Department of Information Security,Naval University of Engineering, Wuhan 430033, China
    2 No.61062 Troops of PLA, Beijing 100091, China
  • Online:2016-02-26 Published:2016-02-26
  • Supported by:
    The National Natural Science Foundation of China;The Military Scientific Research Project of the General Logistics Department

Abstract:

Aiming at the problem that traditional privacy preserving methods were unable to deal with malign analysis with arbitrary background knowledge, a k -means algorithm preserving differential privacy in distributed environment was proposed. This algorithm was under the computing framework of MapReduce. The host tasks were obligated to control the iterations of k -means. The Mapper tasks were appointed to compute the distances between all the records and cluster-ing centers and to mark the records with the clusters which the records belong. The Reducer tasks were appointed to compute the numbers of records which belong to the same clusters and the sums of attributes vectors, and to disturb the numbers and the sums with noises made by Laplace mecha ism, in order to achieve differential privacy preserving. Based on the combinatorial features of differential privacy, theoretically prove that this algorithm is able to fulfill -differentiallye private. The experimental results demonstrate that this method can remain available in the process of preserving privacy and improving efficiency.

Key words: data mining, k-means clustering, MapReduce, differential privacy preserving, Laplace mechanism

No Suggested Reading articles found!