通信学报 ›› 2014, Vol. 35 ›› Issue (10): 200-209.doi: 10.3969/j.issn.1000-436x.2014.10.023
丁丽萍1,卢国庆1,2
出版日期:
2014-10-25
发布日期:
2017-06-14
基金资助:
Li-ping DING1,Guo-qing LU1,2
Online:
2014-10-25
Published:
2017-06-14
Supported by:
摘要:
频繁模式挖掘是数据挖掘的一个基本问题,其模式本身和相应计数都有可能泄露隐私信息。当前,差分隐私通过添加噪音使数据失真,有效实现了隐私保护的目的。首先介绍了差分隐私保护模型的理论基础;其次,详细综述了差分隐私下3种典型的频繁模式挖掘方法的最新研究进展,并进行对比性分析;最后对未来的研究方向进行了展望。
丁丽萍,卢国庆. 面向频繁模式挖掘的差分隐私保护研究综述[J]. 通信学报, 2014, 35(10): 200-209.
Li-ping DING,Guo-qing LU. Survey of differential privacy in frequent pattern mining[J]. Journal on Communications, 2014, 35(10): 200-209.
表2
差分隐私下频繁项集挖掘方法对比分析"
方法名称 | 主要技术 | 主要优点 | 主要缺点 | ε分配 | 算法性能 |
TF | 截断频率 | 实现简单,精度高 | 处理较大k值或l时性能与效率比较差;未考虑记录本身长度带来的影响 | 指数机制、拉普拉斯机制;ε均分为2份,平均分配 | O (k′|D|) |
PrivBasis | θ-基映射 | 性能和效率优于TF | 难以兼顾隐私保护与模式可用性的不足;未考虑记录本身长度带来的影响 | 拉普拉斯机制;ε平均分配 | O (w|D|) |
DP-topkP | 一致性约束 | 一致性约束输出结果,精度较高 | 处理较大k值或l性能与效率比较差;未考虑记录本身长度带来的影响 | 指数机制、拉普拉斯机制;ε被划分为2份,平均分配 | O (k′|D|) |
SmartTrunc | 事务截断 | 规约数据集,精度高 | 扩展性比较差 | 拉普拉斯机制;ε平均分配 | O(|D||L|) |
DiffPart | 分类树划分 | 发布精度高,扩展性高 | 仅支持计数查询;没有考虑不同项之间的语义关联 | 拉普拉斯机制;ε均分为2份,自适应分配 | O (|D||L|) |
DiffMR | 一致性约束 | 支持MapReduce分布式查询,实现简单 | 只支持简单计数查询;处理较大k值或l时性能与效率比较差 | 指数机制、拉普拉斯机制;ε线性分配 | O (k′|D|) |
表3
差分隐私下频繁序列挖掘方法对比分析"
方法名称 | 主要技术 | 主要优点 | 主要缺点 | ε分配 | 算法性能 |
STM-Full | 前缀树划分 | 支持多维数据依赖计数查询;精度高 | 忽视轨迹时间戳信息,实用性差;前缀树冗余 | 拉普拉斯机制;ε线性分配 | O (|D||L|) |
Prefix-Hybrid | 混合粒度前缀树划分 | 支持多维数据依赖计数查询;精度高 | 忽视轨迹时间戳信息,实用性差;前缀树冗余 | 拉普拉斯机制;ε划分为两份,混合分配 | O (|D||L|) |
N-gram | 变长N-gram模型 | 支持多维数据依赖计数查询;精度高 | 容易受到序列维度影响;有一定信息丢失 | 拉普拉斯机制;ε自适应分配 | O (|D||L|) |
PT-Sample | 数据变换和推理求精 | 支持频繁前缀树和序列模式挖掘;精度高 | 数据变换过程中存在信息丢失 | 拉普拉斯机制;ε划分为两份,混合分配 |
[1] | HAN J , KAMBER M , PEI J . Data Mining:Concepts and Techniques[M]. Morgan kaufmann, 2006. |
[2] | DWORK C . Differential privacy[A]. Proc of the 33rd International Colloquium on Automata,Languages and Programming[C]. Berlin:Spinger-Verlag, 2006. |
[3] | 熊平, 朱天清, 王晓峰 . 差分隐私保护及其应用[J]. 计算机学报, 2014,37(1): 101-122. XIONG P , ZHU T Q , WANG X F . A survey on differential privacy and applications[J]. Chinese Journal of Computers, 2014,37(1): 101-122. |
[4] | 张啸剑, 孟小峰 . 面向数据发布和分析的差分隐私保护研究综述[J]. 计算机学报, 2014,37(4): 927-949. ZHANG X J , MENG X F . Differential privacy in data publication and analysis[J]. Chinese Journal of Computers, 2014,37(4): 927-949. |
[5] | BLUM A , DWORK C , McSHERRY F ,et al. Practical privacy:the SuLQ framework[A]. Proc of the 24th ACM SIGMOD International Conference on Management of Data/Principles of Database Systems[C]. NewYork: ACM Press, 2005. 128-138. |
[6] | BLUM A , LIGETT K , ROTH A . A Learning theory approach to non-interactive database privacy[A]. Proc of the 40th Annual ACM Symposium on Theory of Computing(STOC)[C]. Victoria,British Columbia,Canada, 2008. 351-360. |
[7] | DWORK C . A firm foundation for private data analysis[J]. Communications of the ACM, 2011,54(1): 86-95. |
[8] | NGUYEN H H , KIM J , KIM Y . Differential privacy in practice[J]. Journal of Computing Science and Engineering, 2013,7(3): 177-186. |
[9] | DWORK C , MCSHERRY F , NISSIM K ,et al. Calibrating noise to sensitivity in private data analysis[A]. TCC[C]. 2006. 265-284. |
[10] | MCSHERRY F , TALWAR K . Mechanism design via differential privacy[A]. FOCS[C]. 2007. 94-103. |
[11] | DWORK C , NAOR M , VADHAN S . The privacy of the analyst and the power of the state[A]. 2012 IEEE 53rd Annual Symposium on Foundations of Computer Science (FOCS)[C]. 2012. 400-409. |
[12] | GHOSH A , ROUGHGARDEN T , SUNDARARAJAN M . Universally utility-maximizing privacy mechanisms[A]. Proceedings of the 41st Annual ACM Symposium on Theory of Computing[C]. ACM, 2009. 351-360. |
[13] | LI C , HAY M , RASTOGI V ,et al. Optimizing linear counting queries under differential privacy[A]. Proceedings of the 29th ACM SIGMOD-SIGACT-SIGART Symposium on Principles of Database Systems[C]. 2010. 123-134. |
[14] | ZHANG J , ZHANG Z , XIAO X ,et al. Functional mechanism:regression analysis under differential privacy[J]. Proceedings of the VLDB Endowment, 2012,5(11): 1364-1375. |
[15] | MCSHERRY F . Privacy integrated queries:an extensible platform for privacy-preserving data analysis WWW[A]. SIGMOD Conference[C]. 2009. 19-30. |
[16] | KIFER D , MACHANAVAJJHALA A . No free lunch in data privacy[A]. Proceedings of the 2011 ACM SIGMOD International Conference on Management of data[C]. 2011. 193-204. |
[17] | AGRAWAL R , SRIKANT R . Fast algorithms for mining association rules[A]. Proc 20th Int Conf Very Large Data Bases[C]. 1994. 487-499. |
[18] | HAN J , PEI J , YIN Y . Mining frequent patterns without candidate generation[J]. ACM SIGMOD Record, 2000,29(2): 1-12. |
[19] | BHASKAR R , LAXMAN S , SIMTH A ,et al. Discovering frequent patterns in sensitive data[A]. Proceedings of the ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (SIGKDD)[C]. Washington,DC,USA, 2010. 503-512. |
[20] | LI N , QARDAJI W , SU D ,et al. Privbasis:frequent itemset mining with differential privacy[A]. Proceedings of the 38th Conference of Very Large Databases (VLDB)[C]. Istanbul,Turkey, 2012. 1340-1351. |
[21] | 张啸剑, 王淼, 孟小峰 . 差分隐私保护下一种精确挖掘 top-k 频繁模式方法[J]. 计算机研究与发展, 2014,51(1): 104-114. ZHANG X J , WANG M , MENG X J . An accurate method for mining top-k frequent pattern under differential privacy[J]. Journal of Computer Research and Develoment, 2014,51(1): 104-114. |
[22] | ZENG C , NAUGHTON J , CAI J . Y:On differentially private frequent itemset mining[J]. PVLDB, 2012,6(1): 25-36. |
[23] | CHEN R , MOHAMMED N , FUNG B C M ,et al. Publishing set-valued data via differential privacy[A]. Proceedings of the 37th Conference of Very Large Databases(VLDB)[C]. Seattle,Washington,USA, 2011. 1087-1098. |
[24] | HAN X , WANG M , ZHANG X ,et al. Differentially private top-k query over MapReduce[A]. Proceedings of the 4th International Workshop on Cloud Data Management[C]. 2012. 25-32. |
[25] | AGRAWAL R , SRIKANT R . Mining sequential patterns[A]. Proceedings of the 11th International Conference on Data Engineering IEEE[C]. 1995. 3-14. |
[26] | CHEN R , FUNG B , DESAI B C . Differentially private trajectory data publication[EB/OL]. . 2020. |
[27] | CHEN R,áCS G , CASTELLUCCIA C . Differentially private sequential data publication via variable-length n-grams[A]. CCS[C]. 2012. 638-649. |
[28] | BONOMI L , XIONG L , CHEN R ,et al. Frequent grams based embedding for privacy preserving record linkage[A]. CIKM[C]. 2012. 1597-1601. |
[29] | BONOMI L , XIONG L . A two-phase algorithm for mining sequential patterns with differential privacy[A]. CIKM[C]. 2013. 269-278. |
[30] | BONOMI L . Mining frequent patterns with differential privacy[J]. PVLDB, 2013,6(12): 1422-1427. |
[31] | CHEN R , FUNG B , DESAI B C ,et al. Differentially private transit data publication:a case study on the montreal transportation system[A]. Proceedings of the 18th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining[C]. ACM, 2012: 213-221. |
[32] | HAN J , CHENG H , XIN D ,et al. Frequent pattern mining:current status and future directions[J]. Data Mining and Knowledge Discovery, 2007,15(1): 55-86. |
[33] | SHEN ET , YU T . Mining frequent graph patterns with differential privacy[A]. KDD[C]. 2013. 545-553. |
[34] | DWORK C , KENTHAPADI K , MCSHERRY F ,et al. Our Data,Ourselves:Privacy Via Distributed Noise Generation[M]. Springer Berlin Heidelberg, 2006. 486-503. |
[35] | FRIEDMAN A , SCHUSTER A . Data mining with differential privacy[A]. Proceedings of the 16th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining[C]. ACM, 2010. 493-502. |
[36] | MOHAMMED N , CHEN R , FUNG B ,et al. Differentially private data release for data mining[A]. Proceedings of the 17th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining[C]. 2011. 493-501. |
[37] | NISSIM K , RASKHODNIKOVA S , SMITH A . Smooth sensitivity and sampling in private data analysis[A]. Proceedings of the 39th Annual ACM Symposium on Theory of Computing[C]. 2007. 75-84. |
[38] | 李杨, 郝志峰, 温雯 . 差分隐私保护 k-means 聚类方法研究[J]. 计算机科学, 2013,40(3): 287-290. LI Y , HAO ZF , WEN W . Research on differential privacy preserving k-means clustering[J]. Computer Science, 2013,40(3): 287-290. |
[39] | GUPTA A , LIGETT K , MCSHERRY F ,et al. Differentially private combinatorial optimization[A]. Proceedings of the 21st Annual ACM-SIAM Symposium on Discrete Algorithms.Society for Industrial and Applied Mathematics[C]. 2010. 1106-1125. |
[40] | MOHAN P , THAKURTA A , SHI E ,et al. GUPT:privacy preserving data analysis made easy[A]. Proceedings of the 2012 ACM SIGMOD International Conference on Management of Data[C]. 2012. 349-360. |
[41] | HO S S , RUAN S . Differential privacy for location pattern mining[A]. Proceedings of the 4th ACM SIGSPATIAL International Workshop on Security and Privacy in GIS and LBS {SPRINGL)[C]. Chicago,IL,USA, 2011. 17-24. |
[42] | HO S S . Preserving privacy for moving objects data mining[A]. 2012 IEEE International Conference on Intelligence and Security Informatics (ISI)[C]. 2012. 135-137. |
[43] | ROY I , SETTY S T V , KILZER A ,et al. Airavat:security and privacy for MapReduce[A]. Proceedings of the 7th USENIX Conference on Networked Systems Design and Implementation[C]. USENIX Association, 2010. 20-20. |
[1] | 马鑫迪, 李清华, 姜奇, 马卓, 高胜, 田有亮, 马建峰. 面向Non-IID数据的拜占庭鲁棒联邦学习[J]. 通信学报, 2023, 44(6): 138-153. |
[2] | 余晟兴, 陈泽凯, 陈钟, 刘西蒙. DAGUARD:联邦学习下的分布式后门攻击防御方案[J]. 通信学报, 2023, 44(5): 110-122. |
[3] | 冯涛, 陈李秋, 方君丽, 石建明. 基于本地化差分隐私和属性基可搜索加密的区块链数据共享方案[J]. 通信学报, 2023, 44(5): 224-233. |
[4] | 夏莹杰, 朱思雨, 刘雪娇. 区块链架构下具有条件隐私的车辆编队跨信任域高效群组认证研究[J]. 通信学报, 2023, 44(4): 111-123. |
[5] | 胡柏吉, 张晓娟, 李元诚, 赖荣鑫. 支持多功能的V2G网络隐私保护数据聚合方案[J]. 通信学报, 2023, 44(4): 187-200. |
[6] | 徐明, 张保俊, 伍益明, 应晨铎, 郑宁. 面向网络攻击和隐私保护的多智能体系统分布式共识算法[J]. 通信学报, 2023, 44(3): 117-127. |
[7] | 张淑芬, 董燕灵, 徐精诚, 王豪石. 基于目标扰动的AdaBoost算法[J]. 通信学报, 2023, 44(2): 198-209. |
[8] | 余晟兴, 陈钟. 基于同态加密的高效安全联邦学习聚合框架[J]. 通信学报, 2023, 44(1): 14-28. |
[9] | 汤凌韬, 王迪, 刘盛云. 面向非独立同分布数据的联邦学习数据增强方案[J]. 通信学报, 2023, 44(1): 164-176. |
[10] | 袁程胜, 郭强, 付章杰. 基于差分隐私的深度伪造指纹检测模型版权保护算法[J]. 通信学报, 2022, 43(9): 181-193. |
[11] | 王瀚仪, 李效光, 毕文卿, 陈亚虹, 李凤华, 牛犇. 多级本地化差分隐私算法推荐框架[J]. 通信学报, 2022, 43(8): 52-64. |
[12] | 张学旺, 黎志鸿, 林金朝. 基于公平盲签名和分级加密的联盟链隐私保护方案[J]. 通信学报, 2022, 43(8): 131-141. |
[13] | 张勇, 李丹丹, 韩璐, 黄小红. 隐私保护的群体感知数据交易算法[J]. 通信学报, 2022, 43(5): 1-13. |
[14] | 王继锋, 王国峰. 边缘计算模式下密文搜索与共享技术研究[J]. 通信学报, 2022, 43(4): 227-238. |
[15] | 封化民, 史瑞, 袁峰, 李艳俊, 杨旸. 高效的强隐私保护和可转让的属性票据方案[J]. 通信学报, 2022, 43(3): 63-75. |
阅读次数 | ||||||
全文 |
|
|||||
摘要 |
|
|||||
|