大数据 ›› 2020, Vol. 6 ›› Issue (2): 27-40.doi: 10.11959/j.issn.2096-0271.2020012
金澈清1,陈晋川2,刘威3,张召1
出版日期:
2020-03-15
发布日期:
2020-03-21
作者简介:
金澈清(1977- ),男,博士,华东师范大学数据科学与工程学院教授、博士生导师、副院长。中国计算机学会高级会员,数据库专业委员会委员。已发表学术论文100余篇,研究成果曾获得教育部科技进步奖二等奖、上海市科技进步奖一等奖、霍英东教育基金会青年教师奖。担任《计算机研究与发展》编委,主要研究方向为区块链、计算教育学、基于位置的服务等|陈晋川(1978- ),男,博士,中国人民大学信息学院副教授,中国计算机学会会员,区块链专业委员会通信委员,主要研究方向为区块链和分布式数据管理|刘威(1989- ),男,博士,中山大学副研究员,中国计算机学会数据库专业委员会通信委员,主要研究方向为时空大数据分析、推荐系统、个体行为数据分析与挖掘|张召(1977- ),女,博士,华东师范大学数据科学与工程学院副教授,主要研究方向为区块链系统研发、分布式数据管理,多项研究成果发表在VLDB、ICDE和DASFAA等数据管理领域的重要国际会议上。先后主持多项国家自然科学基金项目,作为骨干技术人员,参与开发的“面向大型银行应用的高通量可伸缩分布式数据库系统”获得2017年教育部高等学校科学研究优秀成果科技进步奖一等奖
基金资助:
Online:
2020-03-15
Published:
2020-03-21
Supported by:
摘要:
为支持政府治理方法科学化、过程智能化、结果精细化,政府治理大数据共享、集成与融合不能局限于提供数据访问接口,而是要从语义层面发现实体、找出关联关系以及演化过程。然而,政府治理大数据的多源、异构、动态、海量、孤岛化特性却使之面临严峻挑战。系统性回顾了大规模分布式异构数据共享、集成、融合的基础理论和方法,并指出了构建面向政府治理大数据的高可信共享、高精准集成、高效率融合技术的迫切性。
中图分类号:
金澈清, 陈晋川, 刘威, 张召. 政府治理大数据的共享、集成与融合[J]. 大数据, 2020, 6(2): 27-40.
[1] | 王浦劬 . 国家治理、政府治理和社会治理的基本含义及其相互关系辨析[J]. 社会学评论, 2014,2(3): 12-20. |
WANG P Q . The inherent meaning and interrelationship of state governance,government administration and social governance[J]. Sociological Review of China, 2014,2(3): 12-20. | |
[2] | 孟小峰, 杜治娟 . 大数据融合研究:问题和挑战[J]. 计算机研究与发展, 2016,53(2): 231-246. |
MENG X F , DU Z J . Research on the big data fusion:issues and challenges[J]. Journal of Computer Research and Development, 2016,53(2): 231-246. | |
[3] | STOICA I , MORRIS R , LIBEN-NOWELL D ,et al. Chord:a scalable peer-to-peer lookup protocol for internet applications[J]. IEEE/ACM Transactions on Networking, 2003,11(1): 17-32. |
[4] | ZHU Y C , ZHANG Z , JIN C Q ,et al. SEBDB:semantics empowered blockchain database[C]// The 35th IEEE International Conference on Data Engineering,April 8-11,2019,Macao,China. Piscataway:IEEE Press, 2019: 1820-1831. |
[5] | ASPNES J , JACKSON C , KRISHNAMURTHY A . Exposing computationally-challenged Byzantine impostors[R]. 2005. |
[6] | LAMPORT L , SHOSTAK R , PEASE M . The Byzantine generals problem[J]. ACM Transactions on Programming Languages and Systems, 1982,4(3): 382-401. |
[7] | 维克托·迈尔-舍恩伯格, 肯尼思·库克耶 .大数据时代:生活、工作与思维的大变革[M]. 盛杨燕,周涛,译.杭州: 浙江人民出版社, 2013. |
MAYER-SCH?NBERGER V , CUKIER K . Big data:a revolution that will transform how we live,work,and think[M]. Translated by SHENG Y Y,ZHOU T. Hangzhou: Zhejiang People’s Publishing HousePress, 2013. | |
[8] | 王智慧, 许俭, 汪卫 ,等. 一种基于聚类的数据匿名方法[J]. 软件学报, 2010,21(4): 680-693. |
WANG Z H , XU J , WANG W ,et al. Clustering-based approach for data anonymization[J]. Journal of Software, 2010,21(4): 680-693. | |
[9] | 黄刘生, 田苗苗, 黄河 . 大数据隐私保护密码技术研究综述[J]. 软件学报, 2015,26(4): 945-959. |
HUANG L S , TIAN M M , HUANG H . Preserving privacy in big data:a survey from the cryptographic perspective[J]. Journal of Software, 2015,26(4): 945-959. | |
[10] | QUANTIN C , BOUZELAT H , ALLAERT F ,et al. How to ensure data security of an epidemiological follow-up:quality assessment of an anonymous record linkage procedure[J]. International Journal of Medical Informatics, 1998,49(1): 117-122. |
[11] | O’KEEFE C M , YUNG M , GU L ,et al. Privacypreserving data linkage protocols[C]// The 2004 ACM Workshop on Privacy in the Electronic Society,October 28,2004,Washington,DC,USA. New York:ACM Press, 2004: 94-102. |
[12] | 杨晓春, 刘向宇, 王斌 ,等. 支持多约束的K-匿名化方法[J]. 软件学报, 2006,17(5): 1222-1231. |
YANG X C , LIU X Y , WANG B ,et al. K-anonymization approaches for supporting multiple constraints[J]. Journal of Software, 2006,17(5): 1222-1231. | |
[13] | MCGILLION B , DETTENBORN T , NYMAN T ,et al. Open-TEE:an open virtual trusted execution environment[C]// 2015 IEEE Trustcom/BigDataSE/ISPA,August 20-22,2015,Helsinki,Finland. Piscataway:IEEE Press, 2015: 400-407 |
[14] | CHITICARIU L , TAN W C , GAURAV V . DBNotes:a post-it system for relational databases based on provenance[C]// The 24th ACM SIGACT-SIGMOD-SIGART Symposium on Principles of Database Systems,June 13-15,2005,Baltimore,USA. New York:ACM Press, 2005: 942-944. |
[15] | GLAVIC B , ALONSO G . Perm:processing provenance and data on the same data model through query rewriting[C]// 2009 IEEE 25th International Conference on Data Engineering,March 29-April 2,2009,Shanghai,China. Piscataway:IEEE Press, 2009: 174-785. |
[16] | JENNIFER W , . Trio:a system for integrated management of data,accuracy,and lineage[C]// The 2nd Biennial Conference on Innovative Data System Research,January 4-7,2005,Pacific Grove,USA.[S.l.:s.n]. 2005: 262-276. |
[17] | 李明佳, 汪登, 曾小珊 ,等. 基于区块链的食品安全溯源体系设计[J]. 食品科学, 2019,40(3): 279-285. |
LI M J , WANG D , ZENG X S ,et al. Food safety tracing technology based on block chain[J]. Food Science, 2019,40(3): 279-285. | |
[18] | DONG X L , SAHA B , SRIVASTAVA D . Less is more:selecting sources wisely for integration[J]. Proceedings of the VLDB Endowment, 2012,6(2): 37-48. |
[19] | REKATSINAS T , DONG X L , SRIVASTAVA D . Characterizing and selecting fresh data sources[C]// International Conference on Management of Data,June 22-27,2014,Snowbird,USA. New York:ACM Press, 2014: 919-930. |
[20] | REKATSINAS T , DESHPANDE A , DONG X L ,et al. SourceSight:enabling effective source selection[C]// International Conference on Management of Data,June 26–July 1,2016,San Francisco,USA. New York:ACM Press, 2016: 2157-2160. |
[21] | RAHM E , FALCONER S M , NOY N F ,et al. Schema matching and mapping[J]. Data-Centric Systems and Applications, 2011,30(7): 121-160. |
[22] | CATE B T , DALMAU V , KOLAITIS P G . Learning schema mappings[J]. ACM Transactions on Database Systems, 2013,38(4):28. |
[23] | QIAN L , CAFARELLA M J , JAGADISH H V . Sample-driven schema mapping[C]// International Conference on Management of Data,May 20-24,Scottsdale,USA. New York:ACM Press, 2012: 73-84. |
[24] | BELHAJJAME K , PATON N W , EMBURY S M ,et al. Incrementally improving data spaces based on user feedback[J]. Information Systems, 2013,38(5): 656-687. |
[25] | EL-ROBY A , . Utilizing user feedback to improve data integration systems[C]// The 32nd IEEE International Conference on Data Engineering,May 16-20,2016,Helsinki,Finland. Piscataway:IEEE Press, 2016: 206-210. |
[26] | VERGA P , BELANGER D , STRUBELL E ,et al. Multilingual relation extraction using compositional universal schema[C]// The 15th Annual Conference of the North American Chapter of the Association for Computational Linguistics:Human Language Technologies,June 12-17,2016,San Diego,USA.[S.l.:s.n. ], 2016: 886-896. |
[27] | DONG X L , HALEVY A Y , YU C . Data integration with uncertainty[J]. The VLDB Journal, 2009,18(2): 469-500. |
[28] | DONG X L , GABRILOVICH E , HEITZ G ,et al. From data fusion to knowledge fusion[J]. Proceedings of the VLDB Endowment, 2014,7(10): 881-892. |
[29] | 庄严, 李国良, 冯建华 . 知识库实体对齐技术综述[J]. 计算机研究与发展, 2016,53(1): 165-192. |
ZHUANG Y , LI G L , FENG J H . A survey on entity alignment of knowledge base[J]. Journal of Computer Research and Development, 2016,53(1): 165-192. | |
[30] | CHAUDHURI S , GANTI V , MOTWANI R . Robust identification of fuzzy duplicates[C]// The 21st International Conference on Data Engineering,April 5-8,2005,Tokyo,Japan. Piscataway:IEEE Press, 2005: 865-876. |
[31] | FIRMANI D , SAHA B , SRIVASTAVA D . Online entity resolution using an oracle[J]. Proceedings of the VLDB Endowment, 2016,9(5): 384-395. |
[32] | KONDA P , DAS S , PRASAD S ,et al. Magellan:toward building entity matching management systems[J]. Proceedings of the VLDB Endowment, 2016,9(12): 1197-1208. |
[33] | CHIANG Y H , DOAN A H , NAUGHTON J F . Modeling entity evolution for temporal record matching[C]// International Conference on Management of Data,June 22-27,2014,Snowbird,USA. New York:ACM Press, 2014: 1175-1186. |
[34] | LI F R , LEE M L , HSU W ,et al. Linking temporal records for profiling entities[C]// International Conference on Management of Data,May 31-June 4,2015,Melbourne,USA. New York:ACM Press, 2015: 593-605. |
[35] | HAN X , ZHAO J . Named entity disambiguation by leveraging Wikipedia semantic knowledge[C]// The 2nd ACM Workshop on Social Web Search and Mining,November 2-6,2009,Hong Kong,China. New York:ACM Press, 2009: 215-224. |
[36] | MIHALCEA R , CSOMAI A . Wikify! linking documents to encyclopedic knowledge[C]// Conference on Information and Knowledge Management,November 6-10,2007,Lisbon,Portugal. New York:ACM Press, 2007: 233-242. |
[37] | CUCERZAN S , . Large-scale named entity disambiguation based on WikiPedia data[C]// Conference on Empirical Methods in Natural Language Processing Conference on Computational Natural Language Learning,June 28-30,2007,Prague,Czech Republic.[S.l.:s.n]. 2007: 708-716. |
[38] | ZHANG W , SIM Y C , SU J ,et al. Entity linking with effective acronym expansion,instance selection,and topic modeling[C]// The 9th Workshop on Intelligent Techniques for Web Personalization &Recommender Systems,July 16,2011,Barcelona,Spain. New York:ACM Press, 2011: 1909-1914. |
[39] | GANEA O E , GANEA M , LUCCHI A ,et al. Probabilistic bag-of-hyperlinks model for entity linking[C]// The 25th International Conference on World Wide Web,April 11-15,2016,Montreal,Canada. New York:ACM Press, 2016: 927-938. |
[40] | CHENG G , XU D Y , QU Y Z . Summarizing entity descriptions for effective and efficient human-centered entity linking[C]// The 24th International Conference on World Wide Web,May 18-22,2015,Florence,USA. New York:ACM Press, 2015: 184-194. |
[41] | SIL A , KUNDU G , FLORIAN R ,et al. Neural cross-lingual entity linking[C]// The 32nd AAAI Conference on Artificial Intelligence,February 2-7,2018,New Orleans,USA. Palo Alto:AAAI Press, 2018: 5464-5472. |
[42] | SHEN W , HAN J , WANG J ,et al. SHINE+:a general framework for domain-specific entity linking with heterogeneous information networks[J]. IEEE Transactions on Knowledge and Data Engineering, 2018,30(2): 353-366. |
[43] | YAO Z J , SUN Y F , DING W C ,et al. Dynamic word embeddings for evolving semantic discovery[C]// The 11th ACM International Conference on Web Search and Data Mining,February 5-9,2018,Los Angeles,USA. New York:ACM Press, 2018: 673-681. |
[44] | BASIK F , GEDIK B , ETEMOGLU C ,et al. Spatio-temporal linkage over locationenhanced services[J]. IEEE Transactions on Mobile Computing, 2017,17(2): 447-460. |
[45] | CHEN X , CUI P , YI L ,et al. Scalable optimization for embedding highlydynamic and recency-sensitive data[C]// The 24th ACM SIGKDD Conference on Knowledge Discovery and Data Mining,August 19-23,2018,London,UK. New York:ACM Press, 2018: 130-138. |
[46] | BARRANCO R C , DOS SANTOS R F , HOSSAIN M S ,et al. Tracking the evolution of words with time-reflective text representations[C]// 2018 IEEE International Conference on Big Data,December 10-13,2018,Seattle,USA. Piscataway:IEEE Press, 2018: 2088-2097. |
[1] | 王跃. 基于过程视角的我国政府数据共享流通管道建设关键问题和策略研究[J]. 大数据, 2023, 9(3): 29-38. |
[2] | 李懿, 王劲松, 张洪玮. 基于区块链与函数加密的隐私数据安全共享模型研究[J]. 大数据, 2022, 8(5): 33-44. |
[3] | 郭明军, 陈沁, 安小米, 王建冬, 易成岐. 我国大数据发展指数构建及实践应用——从政务数据与社会数据融合的视角[J]. 大数据, 2022, 8(2): 182-192. |
[4] | 高飞, 周国民, 满芮. 基于生命周期理论的农业科学数据中心化管理模式[J]. 大数据, 2022, 8(1): 24-36. |
[5] | 陈异凡, 闫燊, 杨亚超, 胡林, 樊景超, 张翔鹤, 周国民. 我国农业科学数据共享协议[J]. 大数据, 2022, 8(1): 46-59. |
[6] | 孙苗, 王子珂, 童心, 符昱, 王漪, 康林冲, 姜晓轶. 典型海洋环境观测数据产品应用现状及对我国的启示[J]. 大数据, 2022, 8(1): 73-83. |
[7] | 杨琳, 王炜, 诸纪, 王明政. 面向数据共享的教育数据标准体系研究与建设实践[J]. 大数据, 2020, 6(6): 3-13. |
[8] | 刘彦松, 夏琦, 李柱, 夏虎, 张小松, 高建彬. 基于区块链的链上数据安全共享体系研究[J]. 大数据, 2020, 6(5): 92-105. |
[9] | 张召, 田继鑫, 金澈清. 链上存证、链下传输的可信数据共享平台[J]. 大数据, 2020, 6(5): 106-117. |
[10] | 杨孟辉, 杜小勇. 政府大数据治理:政府管理的新形态[J]. 大数据, 2020, 6(2): 3-18. |
[11] | 吴维刚, 常亮, 任江涛, 古天龙. 面向政府治理大数据的高性能计算系统[J]. 大数据, 2020, 6(2): 41-56. |
[12] | 李政, 洪莹. 基于隐私保护的政府大数据治理研究[J]. 大数据, 2020, 6(2): 69-82. |
[13] | 于明鹤, 聂铁铮, 李国良. 数据管护技术及应用[J]. 大数据, 2019, 5(6): 30-46. |
[14] | 秦永彬, 冯丽, 陈艳平, 黄瑞章, 刘于雷, 丁红发. “智慧法院”数据融合分析与集成应用[J]. 大数据, 2019, 5(3): 35-46. |
[15] | 朱扬勇, 熊贇, 廖志成, 叶雅珍. 数据自治开放模式[J]. 大数据, 2018, 4(2): 3-13. |
阅读次数 | ||||||
全文 |
|
|||||
摘要 |
|
|||||
|