[1] |
王芳, 慎金花 . 国外数据管护(data curation)研究与实践进展[J]. 中国图书馆学报, 2014,40(4): 116-128.
WANG F , SHEN J H . Advances in data curation abroad:research and practice[J]. Journal of Library Science in China, 2014,40(4): 116-128.
[2] |
BISHOP B W , HANK C . Data curation profiling of biocollections[C]// Annual Meeting of the Association for Information Science and Technology,October 14-18,2016,Copenhagen,Denmark. Hoboken:Wiley, 2016: 1-9.
[3] |
BOEHMKE B C . Data wrangling with R[M]. Switzerland: Springer NaturePress, 2016: 1-238.
[4] |
BEHESHTI S , TABEBORDBAR A , BENATALLAH B ,et al. On automating basic data curation tasks[C]// The 26th International Conference on World Wide Web Companion,April 3-7,2017,Perth,Australia. New York:ACM Press, 2017: 165-169.
[5] |
SINGH N , SINGH A K . Data privacy protection mechanisms in cloud[J]. Data Science and Engineering, 2018,3(1): 24-39.
[6] |
BUNEMAN P , CHENEY J , TAN W C ,et al. Curated databases[C]// Symposium on Principles of Database Systems,June 9-11,2008,Vancouver,Canada. New York:ACM Press, 2008: 1-12.
[7] |
PBOHANNON , M FLASTER , W FAN ,et al. A cost-based model and effective heuristic for repairing constraints by value modification[C]// International Conference on Management of Data,June 14-16,2005,Baltimore,USA. New York:ACM Press, 2005: 143-154.
[8] |
CHU X , ILYAS I F , PAPOTTI P . Holistic data cleaning:putting violations into context[C]// International Conference on Data Engineering,April 8-12,2013,Brisbane,Australia. Piscataway:IEEE Press, 2013: 458-469.
[9] |
CHU X , ILYAS I F , KRISHNAN S A ,et al. Data cleaning:overview and emerging challenges[C]// International Conference on Management of Data,June 26 - July 1,2016,San Francisco,USA. New York:ACM Press, 2016: 2201-2206.
[10] |
GOLAB L , KARLOFF H J , KORN F ,et al. On generating near-optimal tableaux for conditional functional dependencies[J]. Proceedings of the VLDB Endowment, 2008,1(1): 376-390.
[11] |
GBESKALES B , ILYAS I F , GOLAB L ,et al. On the relative trust between inconsistent data and inaccurate constraints[C]// International Conference on Data Engineering,April 8-12,2013,Brisbane,Australia. Piscataway:IEEE Press, 2013: 541-552.
[12] |
YAKOUT M , ELMAGARMID A K , NEVILLE J ,et al. Guided data repair[J]. Proceedings of the VLDB Endowment, 2011,4(5): 279-289.
[13] |
WANG J , KRASKA T , FRANKLIN M J ,et al. CrowdER:crowdsourcing entity resolution[J]. Proceedings of the VLDB Endowment, 2012,5(11): 1483-1494.
[14] |
HAO S , TANG N , LI G ,et al. Cleaning relations using knowledge bases[C]// International Conference on Data Engineering,April 19-22,2017,San Diego,USA. Piscataway:IEEE Press, 2017: 933-944.
[15] |
MARCUS A , PARAMESWARAN A . Crowdsourced data management:industry and academic perspectives[J]. Foundations and Trends in Databases, 2013,6(1-2): 1-161.
[16] |
GOKHALE C , DAS S , DOAN A ,et al. Corleone:hands-off crowdsourcing for entity matching[C]// International Conference on Management of Data,June 22-27,2014,Snowbird,USA. New York:ACM Press, 2014: 601-612.
[17] |
HAAS D , WANG J , WU E ,et al. CLAMShell:speeding up crowds for lowlatency data labeling[J]. Proceedings of the VLDB Endowment, 2015,9(4): 372-383.
[18] |
MOZAFARI B , SARKAR P , FRANKLIN M J ,et al. Scaling up crowd-sourcing to very large datasets:a case for active learning[J]. Proceeding of the VLDB Endowment, 2014,8(2): 125-136.
[19] |
ANANTHAKRISHNA R , CHAUDHURI S , GANTI V . Eliminating fuzzy duplicates in data warehouses[C]// International Conference on Very Large Data Bases,August 20-23,2002,Hong Kong,China. San Francisco:Morgan Kaufmann, 2002: 586-597.
[20] |
WANG J , KRISHNAN S , FRANKLIN M J ,et al. A sample-and-clean framework for fast and accurate query processing on dirty data[C]// International Conference on Management of Data,June 22-27,2014,Snowbird,USA. New York:ACM Press, 2014: 469-48
[21] |
KOLB L , THOR A , RAHM E . Dedoop:efficient deduplication with Hadoop[J]. Proceeding of the VLDB Endowment, 2012,5(12): 1878-1881.
[22] |
KHAYYAT Z , ILYAS I F , JINDAL A ,et al. BigDansing:a system for big data cleansing[C]// International Conference on Management of Data,May 31-June 4,2015,Melbourne,Australia. New York:ACM Press, 2015: 1215-1230.
[23] |
CHU X , ILYAS I F , KOUTRIS P . Distributed data deduplication[R]. Waterloo:University of Waterloo, 2016.
[24] |
HUI J , LI L , ZHANG Z . Integration of big data:a survey[C]// International Conference of Pioneering Computer Scientists,Engineers and Educators,September 21-23,2018,Zhengzhou,China. Heidelberg:Springer, 2018: 101-121.
[25] |
LI F , LEE M , HSU W ,et al. Linking temporal records for profiling entities[C]// International Conference on Management of Data,May 31-June 4,2015,Melbourne,Australia. New York:ACM Press, 2015: 593-605.
[26] |
Z ABEDJAN A , AKCORA C G , OUZZANI M ,et al. Temporal rules discovery for web data cleaning[J]. Proceedings of the VLDB Endowment, 2015,9(4): 336-347.
[27] |
PETERMANN A , JUNGHANNS M , MüLLER R ,et al. Graph-based data integration and business intelligence with BIIIG[J]. Proceedings of the VLDB Endowment, 2014,4(13): 1577-1580.
[28] |
LI Q , LI Y , GAO J ,et al. Resolving conflicts in heterogeneous data by truth discovery and source reliability estimation[C]// International Conference on Management of Data,June 22-27,2014,Snowbird,USA. New York:ACM Press, 2014: 1187-1198.
[29] |
LI Q , LI Y , GAO J ,et al. A confidenceaware approach for truth discovery on long-tail data[J]. Proceedings of the VLDB Endowment, 2014,8(4): 425-436.
[30] |
REKATSINAS T , JOGLEKAR M , GARCIA-MOLINA H ,et al. SLiMFast:guaranteed results for data fusion and source reliability[C]// International Conference on Management of Data,May 14-19,2017,Chicago,USA. New York:ACM Press, 2017: 1399-1414.
[31] |
YU R , GADIRAJU U , FETAHU B ,et al. FuseM:query-centric data fusion on structured Web markup[C]// International Conference on Data Engineering,April 19-22,2017,San Diego,USA. Piscataway:IEEE Press, 2017: 179-182.
[32] |
SALLOUM M , DONG X L , SRIVASTAVA D ,et al. Online ordering of overlapping data sources[J]. Proceedings of the VLDB Endowment, 2013,7(3): 133-144.
[33] |
REKATSINAS T , DONG X L , SRIVASTAVA D . Characterizing and selecting fresh data sources[C]// International Conference on Management of Data,June 22-27,2014,Snowbird,USA. New York:ACM Press, 2014: 919-930.
[34] |
BONAQUE R , CAO T D , CAUTIS B ,et al. Mixed-instance querying:a lightweight integration architecture for data journalism[J]. Proceedings of the VLDB Endowment, 2016,9(13): 1513-1516.
[35] |
CHAMANARA J , K?NIG-RIES B , JAGADISH H V . QUIS:InSitu heterogeneous data source querying[J]. Proceedings of the VLDB Endowment, 2017,10(12): 1877-1880.
[36] |
SAWADOGO P , KIBATA T , DARMONT J . Metadata management for textual documents in data lakes[C]// International Conference on Enterprise Information Systems,May 3-5,2019,Heraklion,Greece.[S.l]:SciTePress. 2019: 72-83.
[37] |
STEIN B , MORRISON A . The enterprise data lake:better integration and deeper analytics[J]. Technology Forecast, 2014(1): 1-9.
[38] |
QUIX C , HAI R , VATOV I . Metadata extraction and management in data lakes with GEMMS[J]. Complex Systems Informatics and Modeling Quarterly, 2016(9): 67-83.
[39] |
HAI R , GEISLER S , QUIX C . Constance:an intelligent data lake system[C]// International Conference on Management of Data,June 26-July 1,2016,San Francisco,USA. New York:ACM Press, 2016: 2097-2100.
[40] |
INMON B . Data lake architecture:designing the data lake and avoiding the garbage dump[M]. [S.l.]: Technics PublicationsPress, 2016.
[41] |
FANG H , . Managing data lakes in big data era:what’s a data lake and why has it became popular in data management ecosystem[C]// International Conference on Cyber Technology in Automation,Control and Intelligent Systems,June 8-12,2015,Shenyang,China. Piscataway:IEEE Press, 2015: 820-824.
[42] |
MILOSLAVSKAYA N G , TOLSTOY A I . Application of big data,fast data,and data lake concepts to information security issues[C]// International Conference on Future Internet of Things and Cloud Workshops,August 22-24,2016,Vienna,Austria. Piscataway:IEEE Press, 2016: 148-153.
[43] |
MACCIONI A , TORLONE R . Crossing the finish line faster when paddling the data lake with kayak[J]. Proceedings of the VLDB Endowment, 2017,10(12): 1853-1856.
[44] |
HERSCHEL M , DIESTELKA?MPER R , LAHMAR H B . A survey on provenance:what for,what form,what from[J]. The VLDB Journal, 2017,26(6): 881-906.
[45] |
CHENEY J , CHITICARIU L , TAN W C . Provenance in databases:why,how,and where[J]. Foundations and Trends in Databases, 2009,1(4): 379-474.
[46] |
BUNEMAN P , TAN W C . Data provenance:what next[J]. SIGMOD Record, 2018,47(3): 5-16.
[47] |
BHAGWAT D , CHITICARIU L , TAN W C ,et al. An annotation management system for relational databases[J]. The VLDB Journal, 2005,14(4): 373-396.
[48] |
CHITICARIU L , W CH TAN , VIJAYVARGIYA G . DBNotes:a post-it system for relational databases based on provenance[C]// International Conference on Management of Data,June 14-16,2005,Maryland,USA. New York:ACM Press, 2005: 942-944.
[49] |
GEERTS F , KEMENTSIETSIDIS A , MILANO D . MONDRIAN:annotating and querying databases through colors and blocks[C]// International Conference on Data Engineering,April 3-8,2006,Atlanta,USA. Piscataway:IEEE Press, 2006.
[50] |
BUNEMAN P , CHENEY J , VANSUMMEREN S . On the expressiveness of implicit provenance in query and update languages[J]. ACM Transactions on Database Systems, 2008,33(4): 1-47.
[51] |
BUNEMAN P , KHANNA S , TAJIMA K ,et al. Archiving scientific data[J]. ACM Transactions on Database Systems, 2004,29: 2-42.
[52] |
HUANG S , XU L , LIU J ,et al. Orpheusdb:bolt-on versioning for relational databases[J]. Proceeding of the VLDB Endowment, 2017,10(10): 1130-1141.
[53] |
MADDOX M , GOEHRING D , ELMORE A J ,et al. Decibel:the relational dataset branching system[J]. Proceeding of the VLDB Endowment, 2016,9(9): 624-635.
[54] |
LAPPAS T , TERZI E , GUNOPULOSD . Finding Effectors in Social Networks[C]// International Conference on Knowledge Discovery and Data Mining,July 25-28,2010,Washington,DC,USA. New York:ACM Press, 2010: 1059-1068.
[55] |
SHAH D , ZAMAN T . Rumors in a network:Who’s the culprit[J]. Information Forensics and Security, 2011,57(8): 5163-5181.
[56] |
BUNEMAN P , CHENEY J , LINDLEY S ,et al. DBWiki:a structured wiki for curated data and collaborative data management[C]// International Conference on Management of Data,June 12-16,2011,Athens,Greece. New York:ACM Press, 2011: 1335-1338.
[57] |
B RACHMANN M , BAUTISTA C , CASTELO S ,et al. Data debugging and exploration with vizier[C]// International Conference on Management of Data,June 30-July 5,2019,Amsterdam,The Netherlands. New York:ACM Press, 2019: 1877-1880.
[58] |
CALLAHAN S P , FREIRE J , SANTOS E ,et al. VisTrails:visualization meets data management[C]// International Conference on Management of Data,June 27-29,2006,Chicago,USA. New York:ACM Press, 2006: 745-747.
[59] |
YANG Y , MENEGHETTI N , FEHLING R ,et al. An on-demand approach to ETL[J]. Proceedings of the VLDB Endowment, 2015,8(12): 1578-1589.
[60] |
MARINI L , GUTIERREZ-POLO I , KOOPER R .et al Clowder:open source data management for long tail data[C]// The Practice and Experience on Advanced Research Computing,July 22-26,2018,Pittsburgh,USA. New York:ACM Press, 2018: 1-8.
[61] |
VARGAS-SOLAR B , KEMP G , GALLEGOS I H ,et al. Demonstrating data collections curation and exploration with curare[C]// International Conference on Extending Database Technology,March 26-29,2019,Lisbon,Portugal.[S.l.:s.n. ], 2019: 598-601.
[62] |
WOLLATZ L , SCOTT M , JOHNSTON S J ,et al. Curation of image data for medical research[C]// International Conference on e-Science,October 29 November 1,2018,Amsterdam,The Netherlands. Piscataway:IEEE Press, 2018: 105-113.
[63] |
杜小勇, 陈跃国, 范举 ,等. 数据整理——大数据治理的关键技术[J]. 大数据, 2019,5(3): 13-22.
DU X Y , CHEN Y G , FAN J ,et al. Data wrangling:a key technique of data governance[J]. Big Data Research, 2019,5(3): 13-22.