通信学报 ›› 2016, Vol. 37 ›› Issue (2): 125-131.doi: 10.11959/j.issn.1000-436x.2016038

• 学术论文 • 上一篇    下一篇

MapReduce框架下支持差分隐私保护的k-means聚类方法

李洪成1,吴晓平1,陈燕2   

  1. 1 海军工程大学信息安全系,湖北 武汉 430033
    2 解放军 61062部队,北京100091
  • 出版日期:2016-02-26 发布日期:2016-02-26
  • 基金资助:
    国家自然科学基金资助项目;总后军内科研基金资助项目

k-means clustering method preserving differential privacy in MapReduce framework

Hong-cheng LI1,Xiao-ping WU1,Yan CHEN2   

  1. 1 Department of Information Security,Naval University of Engineering, Wuhan 430033, China
    2 No.61062 Troops of PLA, Beijing 100091, China
  • Online:2016-02-26 Published:2016-02-26
  • Supported by:
    The National Natural Science Foundation of China;The Military Scientific Research Project of the General Logistics Department

摘要:

针对传统隐私保护方法无法应对任意背景知识下恶意分析的问题,提出了分布式环境下满足差分隐私的k-means算法。该算法利用MapReduce计算框架,由主任务控制k-means迭代执行;指派Mapper分任务独立并行计算各数据片中每条记录与聚类中心的距离并标记其属于的聚类;指派Reducer分任务计算同一聚类中的记录数量num和属性向量之和sum,并利用Laplace机制产生的噪声扰动num和sum,进而实现隐私保护。根据差分隐私的组合特性,从理论角度证明整个算法满足e差分隐私保护。实验结果证明了该方法在提高隐私性和时效性的-情况下,保证了较好的可用性。

关键词: 数据挖掘, k-均值聚类, MapReduce, 差分隐私保护, Laplace机制

Abstract:

Aiming at the problem that traditional privacy preserving methods were unable to deal with malign analysis with arbitrary background knowledge, a k -means algorithm preserving differential privacy in distributed environment was proposed. This algorithm was under the computing framework of MapReduce. The host tasks were obligated to control the iterations of k -means. The Mapper tasks were appointed to compute the distances between all the records and cluster-ing centers and to mark the records with the clusters which the records belong. The Reducer tasks were appointed to compute the numbers of records which belong to the same clusters and the sums of attributes vectors, and to disturb the numbers and the sums with noises made by Laplace mecha ism, in order to achieve differential privacy preserving. Based on the combinatorial features of differential privacy, theoretically prove that this algorithm is able to fulfill -differentiallye private. The experimental results demonstrate that this method can remain available in the process of preserving privacy and improving efficiency.

Key words: data mining, k-means clustering, MapReduce, differential privacy preserving, Laplace mechanism

No Suggested Reading articles found!