网络与信息安全学报 ›› 2016, Vol. 2 ›› Issue (11): 47-51.doi: 10.11959/j.issn.2096-109x.2016.00087

• 学术论文 • 上一篇    下一篇

基于Spark的支持隐私保护的聚类算法

高志强,李庆鹏,胡人远   

  1. 武警工程大学信息工程系,陕西 西安710086
  • 修回日期:2016-06-26 出版日期:2016-11-01 发布日期:2016-11-15
  • 作者简介:高志强(1989-),男,河北故城人,武警工程大学博士生,主要研究方向为大数据隐私保护、群体智能优化算法。|李庆鹏(1992-),男,山东济南人,武警工程大学硕士生,主要研究方向为大数据隐私保护、数据挖掘与机器学习。|胡人远(1992-),男,浙江台州人,武警工程大学硕士生,主要研究方向为云计算数据存储、元数据管理、同态加密算法。
  • 基金资助:
    国家自然科学基金资助项目(61309008);陕西省自然科学基金资助项目(2014JQ8049)

Clustering algorithm preserving differential privacy in the framework of Spark

Zhi-qiang GAO,Qing-peng LI,Ren-yuan HU   

  1. Department of Information Engineering,University of PAP,Xi’an 710086,China
  • Revised:2016-06-26 Online:2016-11-01 Published:2016-11-15
  • Supported by:
    The National Natural Science Foundation of China(61309008);The Natural Science Foundation of Shaanxi Province(2014JQ8049)

摘要:

针对经典聚类方法无法应对任意背景知识下恶意攻击者在海量数据挖掘过程中的恶意攻击问题,结合差分隐私保护机制,提出一种适用于Spark内存计算框架下满足差分隐私保护的聚类算法,并从理论上证明了改进算法满足在Spark并行计算框架下的ε-差分隐私。实验结果表明,改进算法在保证聚类结果可用性前提下,具有良好的隐私保护性和满意的运行效率,在海量数据聚类分析的隐私保护挖掘中,具有很好的应用前景和价值。

关键词: Spark, 差分隐私, 聚类算法, 数据挖掘, 大数据分析

Abstract:

Aimed at the problem that traditional methods fail to deal with malicious attacks with arbitrary background knowledge during the process of massive data clustering analysis,an improved clustering algorithm, especially designed for preserving differential privacy,under the framework of Spark was proposed.Furthermore,it’s theoretically proved to meet the standard of ε-differential privacy in the framework of Spark platform.Finally,experimental results show that guaranteeing the availability of proposed clustering algorithm,the improved algorithm has an advantage over privacy protection and satisfaction in the aspect of time as well as efficiency.Most importantly,the proposed algorithm shows a good application prospect in the analysis of data clustering preserving privacy protection and data security.

Key words: Spark,differential privacy, clustering algorithm, data mining, big data analysis

中图分类号: 

No Suggested Reading articles found!