Big Data Research ›› 2022, Vol. 8 ›› Issue (5): 55-73.doi: 10.11959/j.issn.2096-0271.2022073
• TOPIC: DATA CIRCULATION AND PRIVACY COMPUTING • Previous Articles Next Articles
Yan ZHANG, Yifan YANG, Ren YI, Shengmei LUO, Jianfei TANG, Zhengxun XIA
Online:
2022-09-15
Published:
2022-09-01
CLC Number:
Yan ZHANG, Yifan YANG, Ren YI, Shengmei LUO, Jianfei TANG, Zhengxun XIA. Exploration and practice of data quality governance in privacy computing scenarios[J]. Big Data Research, 2022, 8(5): 55-73.
[1] | 中国信息通信研究院,隐私计算联盟. 隐私计算白皮书(2021年)[R]. 2021. |
China Academy of Information and Communication Technology,Privacy Computing Alliance. Privacy computing white paper(2021)[R]. 2021. | |
[2] | 符芳诚, 侯忱, 程勇 ,等. 隐私计算关键技术与创新[J]. 信息通信技术与政策, 2021,47(6): 27-37. |
FU F C , HO U C , CHENG Y ,et al. Key technology and innovation of privacy preserving computing[J]. Information and Communications Technology and Policy, 2021,47(6): 27-37. | |
[3] | HARDY S , HENECKA W , IVEY-LAW H , ,et al. Private federated learning on vertically partitioned data via entity resolution and additively homomorphic encryption[J]. arXiv preprint,2017,arXiv:1711.10677. |
[4] | 李凤华, 李晖, 贾焰 ,等. 隐私计算研究范畴及发展趋势[J]. 通信学报, 2016,37(4): 1-11. |
LI F H , LI H , JIA Y ,et al. Privacy computing:concept,connotation and its research trend[J]. Journal on Communications, 2016,37(4): 1-11. | |
[5] | YANG S W , REN B , ZHOU X H ,et al. Parallel distributed logistic regression for vertical federated learning without thirdparty coordinator[J]. arXiv preprint,2019,arXiv:1911.09824. |
[6] | WAND Y , WANG R Y . Anchoring data quality dimensions in ontological foundations[J]. Communications of the ACM, 1996,39(11): 86-95. |
[7] | PIPINO L L , LEE Y W , WANG R Y . Data quality assessment[J]. Communications of the ACM, 2002,45(4): 211-218. |
[8] | 刘金晶, 王梅 . 大数据下的数据质量评价指标构建实践[J]. 计算机技术与发展, 2019,29(10): 46-50. |
LIU J J , WANG M . Practice of data quality evaluating index construction under big data[J]. Computer Technology and Development, 2019,29(10): 46-50. | |
[9] | 中国信息通信研究院,大数据技术标准推进委员会. 数据资产管理实践白皮书(4.0)[S]. 2019. |
China Academy of Information and Communication Technology,Big Data Technology and Standerd Committee. 数据资产管理实践白皮书(4.0)[S]. 2019. | |
[10] | Firstlogic. Data quality assessment:a methodology for success[Z]. 2003. |
[11] | HEER J , HELLERSTEIN J M , KANDEL S . Data wrangling[M]// Encyclopedia of big data technologies. Cham: Springer, 2019: 584-591. |
[12] | 杨青云, 赵培英, 杨冬青 ,等. 数据质量评估方法研究[J]. 计算机工程与应用, 2004,40(9): 3-4,15. |
YANG Q Y , ZHAO P Y , YANG D Q ,et al. Research on data quality assessment methodology[J]. Computer Engineering and Applications, 2004,40(9): 3-4,15. | |
[13] | WANG R Y , STOREY V C , FIRTH C P . A framework for analysis of data quality research[J]. IEEE Transactions on Knowledge and Data Engineering, 1995,7(4): 623-640. |
[14] | 方幼林, 杨冬青, 唐世渭 ,等. 数据仓库中数据质量控制研究[J]. 计算机工程与应用, 2003,39(13): 1-4. |
FANG Y L , YANG D Q , TANG S W ,et al. Data quality managements in data warehouse[J]. Computer Engineering and Applications, 2003,39(13): 1-4. | |
[15] | 包阳, 齐璇, 李海龙 . 大型软件系统数据质量问题研究[J]. 计算机工程与设计, 2011,32(3): 963-967,987. |
BAO Y , QI X , LI H L . Research on data quality of large-scale software system[J]. Computer Engineering and Design, 2011,32(3): 963-967,987. | |
[16] | 宗威, 吴锋 . 大数据时代下数据质量的挑战[J]. 西安交通大学学报(社会科学版), 2013,33(5): 38-43. |
ZONG W , WU F . The challenge of data quality in the big data age[J]. Journal of Xi’an Jiaotong University (Social Sciences), 2013,33(5): 38-43. | |
[17] | 吴信东, 董丙冰, 堵新政 ,等. 数据治理技术[J]. 软件学报, 2019,30(9): 2830-2856. |
WU X D , DONG B B , DU X Z ,et al. Data governance technology[J]. Journal of Software, 2019,30(9): 2830-2856. | |
[18] | 中国信息通信研究院. 数据安全治理实践指南(1.0)[R]. 2001. |
China Academy of Information and Communication Technology. Data security governance practice guide (1.0)[R]. 2001. | |
[19] | 黄刘生, 田苗苗, 黄河 . 大数据隐私保护密码技术研究综述[J]. 软件学报, 2015,26(4): 945-959. |
HUANG L S , TIAN M M , HUANG H . Preserving privacy in big data:a survey from the cryptographic perspective[J]. Journal of Software, 2015,26(4): 945-959. | |
[20] | 彭南博, 王虎 ,等. 联邦学习技术及实战[M]. 北京: 电子工业出版社, 2021. |
PENG N B , WANG H ,et al. Federated learning techniques and practices[M]. Beijing: Publishing House of Electronics Industry, 2021. | |
[21] | 杨强, 刘洋, 程勇 ,等. 联邦学习[M]. 北京: 电子工业出版社, 2020. |
YANG Q , LIU Y , CHENG Y ,et al. Federated learning[M]. Beijing: Publishing House of Electronics Industry, 2020. | |
[22] | 李安然 . 面向特定任务的大规模数据集质量高效评估[D]. 合肥:中国科学技术大学, 2021. |
LI A R . Efficient task-oriented quality assessment for large-scale datasets[D]. Hefei:University of Science and Technology of China, 2021. | |
[23] | WANG G , DANG C X , ZHOU Z Y . Measure contribution of participants in federated learning[C]// Proceedings of 2019 IEEE International Conference on Big Data. Piscataway:IEEE Press, 2019: 2597-2604. |
[24] | 朱建明, 张沁楠, 高胜 ,等. 基于区块链的隐私保护可信联邦学习模型[J]. 计算机学报, 2021,44(12): 2464-2484. |
ZHU J M , ZHANG Q N , GAO S ,et al. Privacy preserving and trustworthy federated learning model based on blockchain[J]. Chinese Journal of Computers, 2021,44(12): 2464-2484. | |
[25] | 王鑫, 周泽宝, 余芸 ,等. 一种面向电能量数据的联邦学习可靠性激励机制[J]. 计算机科学, 2022,49(3): 31-38. |
WANG X , ZHOU Z B , YU Y ,et al. Reliable incentive mechanism for federated learning of electric metering data[J]. Computer Science, 2022,49(3): 31-38. | |
[26] | KONE?NY J , MCMAHAN H B , YU F X ,et al. Federated learning:strategies for improving communication efficiency[J]. arXiv preprint,2016,arXiv:1610.05492. |
[27] | LI T , SAHU A K , TALWALKAR A ,et al. Federated learning:challenges,methods,and future directions[J]. arXiv preprint,2019,arXiv:1908.07873. |
[28] | YAO A C , . Protocols for secure computations[C]// Proceedings of 23rd Annual Symposium on Foundations of Computer Science. Piscataway:IEEE Press, 1982: 160-164. |
[29] | Open Mobile Terminal Platform Consortium. Advanced trusted environment:OMTP TR1[Z]. 2009. |
[30] | 杨强 . 联邦学习:人工智能的最后一公里[J]. 智能系统学报, 2020,15(1): 183-186. |
YANG Q . Federated learning:the last on kilometer of artificial intelligence[J]. CAAI Transactions on Intelligent Systems, 2020,15(1): 183-186. | |
[31] | 杨一帆, 邵一淼, 施宇 . 一种分位数的获取方法,设备及存储介质:CN202111153418[J].[P]. 2021-09-29. |
YANG Y F , SHAO Y M , SHI Y . A method,device and storage medium for obtaining quantiles:CN202111153418[J].[P]. 2021-09-29. | |
[32] | CRISTOFARO E , TSUDIK G . Practical private set intersection protocols with linear computational and bandwidth complexity[C]// Proceedings of the 14th International Conference on Financial Cryptography and Data Security. Heidelberg:Springer, 2010: 143-159. |
[33] | CRISTOFARO E , TSUDIK G . On the performance of certain private set intersection protocols[C]// Proceedings of the 5th International Conference on Trust &Trustworthy Computing.[S.l.:s.n.], 2012. |
[34] | FREEDMAN M J , NISSIM K , PINKAS B . Efficient private matching and set intersection[C]// Proceedings of the 2014 International Conference on the Theory and Applications of Cryptographic Techniques. Heidelberg:Springer, 2004: 1-19. |
[35] | GOOD I J . Weight of evidence:a brief survey[J]. Bayesian Statistics, 1985(2): 249-270. |
[36] | RODRIGUEZ-LUJAN I , HUERTA R , ELKAN C ,et al. Quadratic programming feature selection[J]. The Journal of Machine Learning Research, 2010,11(2): 1491-1516. |
[37] | JOHNSON T , DASU T . Data quality and data cleaning[C]// Proceedings of the 2003 ACM SIGMOD International Conference on Management of Data. New York:ACM Press, 2003:681. |
[38] | 叶焕倬, 吴迪 . 相似重复记录清理方法研究综述[J]. 现代图书情报技术, 2010(9): 56-66. |
YE H Z , WU D . A survey of approximately duplicate data cleaning method[J]. New Technology of Library and Information Service, 2010(9): 56-66. | |
[39] | 朱晓峰 . 缺失值填充的若干问题研究[D]. 桂林:广西师范大学, 2007. |
ZHU X F . Studies on missing data imputation[D]. Guilin:Guangxi Normal University, 2007. | |
[40] | 程开明 . 统计数据预处理的理论与方法述评[J]. 统计与信息论坛, 2007,22(6): 98-103. |
CHENG K M . The theory and methods of data preparation:an overview[J]. Statistics & Information Forum, 2007,22(6): 98-103. | |
[41] | 贾俊平, 何晓群, 金勇进 . 统计学(第六版)[M]. 北京: 中国人民大学出版社, 2015. |
JIA J P , HE X Q , JIN Y J . Statistics[M]. Beijing: China Renmin University Press, 2015. | |
[42] | LIPOVETSKY S , CONKLIN M . Analysis of regression in game theory approach[J]. Applied Stochastic Models in Business and Industry, 2001,17(4): 319-330. |
[43] | ?TRUMBELJ E , KONONENKO I . Explaining prediction models and individual predictions with feature contributions[J]. Knowledge and Information Systems, 2014,41(3): 647-665. |
[44] | LUNDBERG S , LEE S I . A unified approach to interpreting model predictions[J]. arXiv preprint,2017,arXiv:1705.07874. |
[45] | 汪云云, 陈松灿 . 基于AUC的分类器评价和设计综述[J]. 模式识别与人工智能, 2011,24(1): 64-71. |
WANG Y Y , CHEN S C . A survey of evaluation and design for AUC based classifier[J]. Pattern Recognition and Artificial Intelligence, 2011,24(1): 64-71. | |
[46] | 张义莲, 颜晟, 朱旻捷 ,等. 机器学习系统毒化攻击综述[J]. 通信技术, 2020,53(3): 535-542. |
ZHANG Y L , YAN S , ZHU M J ,et al. Overview on poisoning attacks against machine learning system[J]. Communications Technology, 2020,53(3): 535-542. |
[1] | Chuanyao ZHANG, Shijing SI, Jianzong WANG, Jing XIAO. Federated meta learning: a review [J]. Big Data Research, 2023, 9(2): 122-146. |
[2] | Jianhan WU, Shijing SI, Jianzong WANG, Jing XIAO. Threats and defenses of federated learning: a survey [J]. Big Data Research, 2022, 8(5): 12-32. |
[3] | Hongshu YIN, Xuhua ZHOU, Wenjun ZHOU. Research on privacy preservation of member inference attacks in online inference process for vertical federated learning linear model [J]. Big Data Research, 2022, 8(5): 45-54. |
[4] | Ming LI, Abin LYU. Exploration and practice of privacy preserving computing for vehicle-road collaboration system [J]. Big Data Research, 2022, 8(5): 74-87. |
[5] | Zhitao ZHU, Shijing SI, Jianzong WANG, Jing XIAO. Survey on federated recommendation systems [J]. Big Data Research, 2022, 8(4): 105-132. |
[6] | Jianzong WANG, Lingwei KONG, Zhangcheng HUANG, Linjie CHEN, Yi LIU, Chunxi LU, Jing XIAO. Research advances on privacy protection of federated learning [J]. Big Data Research, 2021, 7(3): 130-149. |
[7] | Jianzong WANG, Lingwei KONG, Zhangcheng HUANG, Linjie CHEN, Yi LIU, Anxun HE, Jing XIAO. Research review of federated learning algorithms [J]. Big Data Research, 2020, 6(6): 64-82. |
Viewed | ||||||
Full text |
|
|||||
Abstract |
|
|||||
|