Big Data Research ›› 2023, Vol. 9 ›› Issue (1): 1-20.doi: 10.11959/j.issn.2096-0271.2023009
• STRATEGY RESEARCH • Next Articles
Hong MEI1, Xiaoyong DU2, Hai JIN3, Xueqi CHENG4,5, Yunpeng CHAI2, Xuanhua SHI3, Xiaolong JIN4,5, Yasha WANG1, Chi LIU6
Online:
2023-01-15
Published:
2023-01-01
CLC Number:
Hong MEI, Xiaoyong DU, Hai JIN, Xueqi CHENG, Yunpeng CHAI, Xuanhua SHI, Xiaolong JIN, Yasha WANG, Chi LIU. Big data technologies forward-looking[J]. Big Data Research, 2023, 9(1): 1-20.
[1] | 裴威, 李战怀, 潘巍 . GPU数据库核心技术综述[J]. 软件学报, 2021,32(3): 859-885. |
PEI W , LI Z H , PAN W . Survey of key technologies in GPU database system[J]. Journal of Software, 2021,32(3): 859-885. | |
[2] | SHERKAT R , FLORENDO C , ANDREI M ,et al. Native store extension for SAP HANA[J]. Proceedings of the VLDB Endowment, 2019,12(12): 2047-2058. |
[3] | SHEN S J , CHEN R , CHEN H B ,et al. Retrofitting High availability mechanism to tame hybrid transaction/analytical processing[C]// Proceedings of 2021 Operating Systems Design and Implementation.[S.l.:s.n.], 2021: 219-238. |
[4] | LIU G , CHEN L Y , CHEN S M . Zen:a high-throughput log-free OLTP engine for non-volatile main memory[J]. Proceedings of the VLDB Endowment, 2021,14(5): 835-848. |
[5] | KRASKA T , BEUTEL A , CHI E H ,et al. The case for learned index structures[C]// Proceedings of 2018 International Conference on Management of Data. New York:ACM Press, 2018: 489-504. |
[6] | CHATTERJEE S , JAGADEESAN M , QIN W ,et al. Cosine[J]. Proceedings of the VLDB Endowment, 2021,15(1): 112-126. |
[7] | DAS S , GRBIC M , ILIC I ,et al. Automatically indexing millions of databases in microsoft azure SQL database[C]// Proceedings of 2019 International Conference on Management of Data. New York:ACM Press, 2019: 666-679. |
[8] | AHMED R , BELLO R , WITKOWSKI A ,et al. Automated generation of materialized views in Oracle[J]. Proceedings of the VLDB Endowment, 2020,13(12): 3046-3058. |
[9] | LIU X Z , YIN Z , ZHAO C ,et al. PinSQL:pinpoint root cause SQLs to resolve performance issues in cloud databases[C]// Proceedings of 2022 IEEE 38th International Conference on Data Engineering. Piscataway:IEEE Press, 2022: 2549-2561. |
[10] | LI G L , ZHOU X H , SUN J ,et al. OpenGauss:an autonomous database system[C]// Proceedings of the International Conference on Very Large Databases.[S.l.:s.n.], 2021,14(12): 3028-3041. |
[11] | ZHOU X H , LI G L , CHAI C L ,et al. A learned query rewrite system using Monte Carlo tree search[J]. Proceedings of the VLDB Endowment, 2021,15(1): 46-58. |
[12] | WANG J Y , CHAI C L , LIU J B ,et al. FACE:a normalizing flowbased cardinality estimator[C]// Proceedings of the International Conference on Very Large Databases.[S.l.:s.n.], 2022,15(1): 72-84. |
[13] | DEPOUTOVITCH A , CHEN C , CHEN J ,et al. Taurus database:how to be fast,available,and frugal in the cloud[C]// Proceedings of 2020 ACM SIGMOD International Conference on Management of Data. New York:ACM Press, 2020: 1463-1478. |
[14] | CAO W , LIU Z J , WANG P ,et al. PolarFS:an ultra-low latency and failure resilient distributed file system for shared storage cloud database[J]. Proceedings of the VLDB Endowment, 2018,11(12): 1849-1862. |
[15] | TAFT R , SHARIF I , MATEI A ,et al. CockroachDB:the resilient geodistributed SQL database[C]// Proceedings of 2020 ACM SIGMOD International Conference on Management of Data. New York:ACM Press, 2020: 1493-1509. |
[16] | CAO W , LIU Z J , WANG P ,et al. PolarFS:an ultra-low latency and failure resilient distributed file system for shared storage cloud database[J]. Proceedings of the VLDB Endowment, 2018,11(12): 1849-1862. |
[17] | WANG Y Y , WANG Z K , CHAI Y P ,et al. Rethink the linearizability constraints of raft for distributed key-value stores[C]// Proceedings of 2021 IEEE 37th International Conference on Data Engineering. Piscataway:IEEE Press, 2021: 1877-1882. |
[18] | HUANG D X , LIU Q , CUI Q ,et al. TiDB[J]. Proceedings of the VLDB Endowment, 2020,13(12): 3072-3084. |
[19] | WANG H X , XU C , ZHANG C ,et al. A blockchain system ensuring query integrity[C]// Proceedings of the ACM SIGMOD International Conference on Management of Data. New York:ACM Press, 2020: 2693-2696. |
[20] | DANG H , DINH T T A , LOGHIN D ,et al. Towards scaling blockchain systems via sharding[C]// Proceedings of 2019 International Conference on Management of Data. New York:ACM Press, 2019: 123-140. |
[21] | ALAKUIJALA J , FARRUGGIA A , FERRAGINA P ,et al. Brotli:a generalpurpose data compressor[J]. ACM Transactions on Information Systems, 2019,37(1): 1-30. |
[22] | CAO W , ZHANG Y Q , YANG X J ,et al. PolarDB serverless:a cloud native database for disaggregated data centers[C]// Proceedings of 2021 International Conference on Management of Data. New York:ACM Press, 2021: 2477-2489. |
[23] | ZHANG F , WAN W T , ZHANG C Y ,et al. CompressDB:enabling efficient compressed data direct processing for various databases[C]// Proceedings of 2022 International Conference on Management of Data.[S.l.:s.n.], 2022: 1655-1669. |
[24] | WOJTOWICZ D T , YIN S Y , MORVAN F ,et al. Cost-effective dynamic optimisation for multi-cloud queries[C]// Proceedings of 2021 IEEE 14th International Conference on Cloud Computing. Piscataway:IEEE Press, 2021: 387-397. |
[25] | 王建冬, 于施洋, 窦悦 . 东数西算:我国数据跨域流通的总体框架和实施路径研究[J]. 电子政务, 2020(3): 13-21. |
WANG J D , YU S Y , DOU Y . East-west computing transfer:research on the overall framework and implementation path of cross-domain data circulation in China[J]. E-Government, 2020(3): 13-21. | |
[26] | DEAN J , GHEMAWAT S . MapReduce:simplified data processing on large clusters[J]. Communications of the ACM, 2008,51(1): 137-150. |
[27] | FEY M , LENSSEN J E . Fast graph representation learning with PyTorch geometric[J]. arXiv preprint, 2019,arXiv:1903.02428v2. |
[28] | RASCHKA S , PATTERSON J , NOLET C . Machine learning in python:main developments and technology trends in data science,machine learning,and artificial intelligence[J]. Information, 2020,11(4): 193. |
[29] | AHN J , YOO S , MUTLU O ,et al. PIMenabled instructions:a low-overhead,locality-aware processing-in-memory architecture[J]. Computer Architecture News, 2015,43(3): 336-348. |
[30] | WU M Y , ZHAO Z M , LI H Y ,et al. Espresso:brewing Java for more non-volatility with non-volatile memory[C]// Proceedings of the 23rd International Conference on Architectural Support for Programming Languages and Operating Systems. New York:ACM Press, 2018: 70-83. |
[31] | SHI X H , KE Z X , ZHOU Y L ,et al. Deca:a garbage collection optimizer for in-memory data processing[J]. ACM Transactions on Computer Systems, 2018,36(1): 1-47. |
[32] | ZEUCH S , MONTE B D , KARIMOV J ,et al. Analyzing efficient stream processing on modern hardware[J]. Proceedings of the VLDB Endowment, 2019,12(5): 516-530. |
[33] | TOSHNIWAL A , TANEJA S , SHUKLA A ,et al. Storm@twitter[C]// Proceedings of 2014 ACM SIGMOD International Conference on Management of Data. New York:ACM Press, 2014. |
[34] | ZAHARIA M , DAS T , LI H Y ,et al. Discretized streams:fault-tolerant streaming computation at scale[C]// Proceedings of the 24th ACM Symposium on Operating Systems Principles. New York:ACM Press, 2013. |
[35] | NASIR M A U , MORALES G D F , GARCíA-SORIANO D ,et al. The power of both choices:practical load balancing for distributed stream processing engines[C]// Proceedings of 2015 IEEE 31st International Conference on Data Engineering. Piscataway:IEEE Press, 2015: 137-148. |
[36] | NASIR M A U , MORALES G D F , KOURTELLIS N ,et al. When two choices are not enough:balancing at scale in distributed stream processing[C]// Proceedings of 2016 IEEE 32nd International Conference on Data Engineering. Piscataway:IEEE Press, 2016: 589-600. |
[37] | ABDELHAMID A S , MAHMOOD A R , DAGHISTANI A ,et al. Prompt:dynamic data-partitioning for distributed microbatch stream processing systems[C]// Proceedings of 2020 ACM SIGMOD International Conference on Management of Data. New York:ACM Press, 2020: 2455-2469. |
[38] | CHEN H H , ZHANG F , JIN H . PStream:a popularity-aware differentiated distributed stream processing system[J]. IEEE Transactions on Computers, 2021,70(10): 1582-1597. |
[39] | MALEWICZ G , AUSTERN M H , BIK A J C ,et al. Pregel:a system for large-scale graph processing[C]// Proceedings of 2010 ACM SIGMOD International Conference on Management of Data. New York:ACM Press, 2010: 135-146. |
[40] | WANG Y , DAVIDSON A , PAN Y C ,et al. Gunrock:a high-performance graph processing library on the GPU[C]// Proceedings of the 21st ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming. New York:ACM Press, 2016. |
[41] | ZHOU S J , KANNAN R , PRASANNA V K ,et al. HitGraph:high-throughput graph processing framework on FPGA[J]. IEEE Transactions on Parallel and Distributed Systems, 2019,30(10): 2249-2264. |
[42] | RAHMAN S , ABU-GHAZALEH N , GUPTA R . GraphPulse:an event-driven hardware accelerator for asynchronous graph processing[C]// Proceedings of 2020 53rd Annual IEEE/ACM International Symposium on Microarchitecture. Piscataway:IEEE Press, 2020: 908-921. |
[43] | SHUN J L , BLELLOCH G E . Ligra:a lightweight graph processing framework for shared memory[C]// Proceedings of the 18th ACM SIGPLAN symposium on Principles and practice of parallel programming. New York:ACM Press, 2013: 135-146. |
[44] | GONZALEZ J E , LOW Y C , GU H J ,et al. PowerGraph:distributed graph-parallel computation on natural graphs[C]// Proceedings of the 10th USENIX Conference on Operating Systems Design and Implementation. New York:ACM Press, 2012: 17-30. |
[45] | KYROLA A , BLELLOCH G , GUESTRIN C . GraphChi:large-scale graph computation on just a PC[C]// Proceedings of the 10th USENIX Symposium on Operating Systems Design and Implementation. New York:ACM Press, 2012: 31-46. |
[46] | HAM T J , WU L S , SUNDARAM N ,et al. Graphicionado:a high-performance and energy-efficient accelerator for graph analytics[C]// Proceedings of 2016 49th Annual IEEE/ACM International Symposium on Microarchitecture. Piscataway:IEEE Press, 2016: 1-13. |
[47] | ZHANG Y , LIAO X F , JIN H ,et al. HotGraph:efficient asynchronous processing for real-world graphs[J]. IEEE Transactions on Computers, 2017,66(5): 799-809. |
[48] | ZHANG K Y , CHEN R , CHEN H B . NUMA-aware graph-structured analytics[C]// Proceedings of the 20th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming. New York:ACM Press, 2015: 183-193. |
[49] | ZHU X W , CHEN W G , ZHENG W M ,et al. Gemini:a computation- centric distributed graph processing system[C]// Proceedings of the 12th USENIX conference on Operating Systems Design and Implementation. New York:ACM Press, 2016: 301-316. |
[50] | ZHANG Y , LIAO X F , GU L ,et al. Asyngraph:maximizing data parallelism for efficient iterative graph processing on gpus[J]. ACM Transactions on Architecture and Code Optimization, 2020,17(4): 1-21. |
[51] | DAI G H , HUANG T H , CHI Y Z ,et al. ForeGraph:exploring largescale graph processing on multi-FPGA architecture[C]// Proceedings of 2017 ACM/SIGDA International Symposium on Field-Programmable Gate Arrays. New York:ACM Press, 2017: 217-226. |
[52] | ZHAO J , YANG Y , ZHANG Y ,et al. TDGraph:a topology-driven accelerator for high-performance streaming graph processing[C]// Proceedings of the 49th Annual International Symposium on Computer Architecture. New York:ACM Press, 2022: 116-129. |
[53] | LIN H , ZHU X W , YU B W ,et al. ShenTu:processing multi-trillion edge graphs on millions of cores in seconds[C]// Proceedings of International Conference for High Performance Computing,Networking,Storage and Analysis. Piscataway:IEEE Press, 2018. |
[54] | ZHANG Y , LIAO X F , JIN H ,et al. DiGraph:an efficient path-based iterative directed graph processing system on multiple GPUs[C]// Proceedings of the 24th International Conference on Architectural Support for Programming Languages and Operating Systems. New York:ACM Press, 2019: 601-614. |
[55] | PHAM H , LIANG P P , MANZINI T ,et al. Found in translation:learning robust joint representations by cyclic translations between modalities[J]. Proceedings of the AAAI Conference on Artificial Intelligence, 2019,33(1): 6892-6899. |
[56] | WANG W H , BAO H B , DONG L ,et al. Image as a foreign language:BEIT pretraining for all vision and visionlanguage tasks[J]. arXiv preprint, 2022,arXiv:2208.10442. |
[57] | CHEN X , WANG X , CHANGPINYO S ,et al. Pali:a jointly-scaled multilingual language-image model[J]. arXiv preprint, 2022,arXiv:2209.06794. |
[58] | LIU J , ZHU X X , LIU F ,et al. OPT:omni-perception pre-trainer for crossmodal understanding and generation[J]. arXiv preprint, 2021,arXiv:2107.00249. |
[59] | MAMMEN P M . Federated learning:opportunities and challenges[J]. arXiv preprint, 2021,arXiv:2101.05428. |
[60] | ZILLER A , TRASK A , LOPARDO A ,et al. PySyft:a library for easy federated learning[M]// Federated learning systems. Cham: Springer, 2021: 111-139. |
[61] | WELTEN S , MOU Y L , NEUMANN L ,et al. A privacy-preserving distributed analytics platform for health care data[J]. Methods of Information in Medicine, 2022,61(S 01): e1-e11. |
[62] | LI Q B , WEN Z Y , WU Z M ,et al. A survey on federated learning systems:vision,hype and reality for data privacy and protection[J]. IEEE Transactions on Knowledge and Data Engineering, 2021:10.1109/TKDE.2021.3124599. |
[63] | RUBIN D B . Estimating causal effects of treatments in randomized and nonrandomized studies[J]. Journal of Educational Psychology, 1974,66(5): 688-701. |
[64] | PEARL J . Causality:models,reasoning and inference[M]. Cambridge: Cambridge University Press, 2009. |
[65] | PEARL J , MACKENZIE D . The book of why:the new science of cause and effect[J]. Journal of MultiDisciplinary Evaluation, 2018,14(31): 47-54. |
[66] | SUN X W , WU B T , ZHENG X Y ,et al. Recovering latent causal factor for generalization to distributional shifts[C]// Advances in Neural Information Processing Systems.[S.l.:s.n.], 2021: 16846-16859. |
[67] | CUI P , ATHEY S . Stable learning establishes some common ground between causal inference and machine learning[J]. Nature Machine Intelligence, 2022,4(2): 110-115. |
[68] | ZHANG Y , FENG F L , HE X N ,et al. Causal intervention for leveraging popularity bias in recommendation[J]. arXiv preprint, 2021,arXiv:2105.06067. |
[69] | ZHU Z M , CHEN X H , TIAN H L ,et al. Offline reinforcement learning with causal structured world models[J]. arXiv preprint, 2022,arXiv:2206.01474. |
[70] | STONEBRAKER M . The solution:data curation at scale[M]. Getting data right.[S.l.]: O’Reilly, 2016. |
[71] | 华为公司数据管理部. 华为数据之道[M]. 北京: 机械工业出版社, 2020. |
Data Management Department of Huawei. Enterprise data at Huawei[M]. Beijing: China Machine Press, 2020. | |
[72] | REKATSINAS T , CHU X , ILYAS I F ,et al. HoloClean:holistic data repairs with probabilistic inference[J]. arXiv preprint,2017, 2017,arXiv:1702.00820. |
[73] | DONG X , GABRILOVICH E , HEITZ G ,et al. Knowledge vault:a web-scale approach to probabilistic knowledge fusion[J]. SIGKDD Explorations, 2014(CD/ROM): 597-606. |
[74] | 郝爽, 李国良, 冯建华 ,等. 结构化数据清洗技术综述[J]. 清华大学学报(自然科学版), 2018,58(12): 1037-1050. |
HAO S , LI G L , FENG J H ,et al. Survey of structured data cleaning methods[J]. Journal of Tsinghua University (Science and Technology), 2018,58(12): 1037-1050. | |
[75] | 丁小欧, 王宏志, 于晟健 . 工业时序大数据质量管理[J]. 大数据, 2019,5(6): 1-11. |
DING X O , WANG H Z , YU S J . Data quality management of industrial temporal big data[J]. Big Data Research, 2019,5(6): 1-11. | |
[76] | KAHN R , WILENSKY R . A framework for distributed digital object services[J]. International Journal on Digital Libraries, 2006,6(2): 115-123. |
[77] | 梅宏, 黄罡, 刘譞哲 ,等. 网构软件研究:回顾与展望[J]. 科学通报, 2022,67(32): 3782-3792. |
MEI H , HUANG G , LIU X Z ,et al. Research on internetware:review and prospect[J]. Chinese Science Bulletin, 2022,67(32): 3782-3792. | |
[78] | 黄罡 . 数联网:数字空间基础设施[J]. 中国计算机学会通讯, 2021,17(12): 58-60. |
HUANG G . Internet of Data:infrastructure of digtital space[J]. Communications of the CCF, 2021,17(12): 58-60. |
[1] | Wenlong LI, Yuan YUAN, Xiaopeng AN. Modus operandi of big data governance: some preliminary observations [J]. Big Data Research, 2022, 8(4): 34-45. |
[2] | Ming GONG, Xiangyu JIANG, Ying CHEN, Zhaofeng LIU. Software infrastructures for Chinese supercomputers from the perspective of lattice QCD applications [J]. Big Data Research, 2021, 7(5): 31-39. |
[3] | Chenhao ZHANG, Limin XIAO, Guangjun QIN, Yao SONG, Shixuan JIANG, Jiye WANG. A wide-area collaborative scheduling system oriented to big data processing applications [J]. Big Data Research, 2021, 7(5): 82-97. |
[4] | Dawen XIA, Lin WANG, Qian ZHANG, Jiayin WEI, Fujian FENG, Huaqing LI. Teaching reform and practice of big data application technology course [J]. Big Data Research, 2020, 6(4): 115-124. |
[5] | Huayou SU, Songzhu MEI, Rongchun LI, Yong DOU. The usage of dataflow model in GPU and big data processing [J]. Big Data Research, 2020, 6(3): 117-128. |
[6] | Nifei BI, Guangyao DING, Qihang CHEN, Chen XU, Aoying ZHOU. Dataflow model and its applications in big data processing [J]. Big Data Research, 2020, 6(3): 73-86. |
[7] | Xiaofeng ZOU, Wangdong YANG, Xuecheng RONG, Kenli LI, Keqin LI. A survey of dataflow programming models and tools for big data processing [J]. Big Data Research, 2020, 6(3): 59-72. |
[8] | Weigang WU, Liang CHANG, Jiangtao REN, Tianlong GU. High performance big data computing systems for government governance [J]. Big Data Research, 2020, 6(2): 41-56. |
[9] | Menghui YANG, Xiaoyong DU. Big data governance in governments:a new form of the government administration [J]. Big Data Research, 2020, 6(2): 3-18. |
[10] | Wangyue LI, Jin LIU, Na CHEN. Application research of big data technology in rural portrait [J]. Big Data Research, 2020, 6(1): 99-118. |
[11] | Hong DAI, Qun ZHANG, Zhuo YIN. Study on big data governance standard system [J]. Big Data Research, 2019, 5(3): 47-54. |
[12] | Dongxing JIANG, Ruonan GAO, Haoyu WANG. Research on supervising big data governance method for securities and futures industry [J]. Big Data Research, 2019, 5(3): 23-34. |
[13] | Xiaomi AN, Mingjun GUO, Xuehai HONG, Wei WEI. Framework of government big data governance system and effective way of implementation [J]. Big Data Research, 2019, 5(3): 3-12. |
[14] | Chonggang LI, Huiquan XU. Smoke index:big data technologies monitor Internet financial risks [J]. Big Data Research, 2018, 4(4): 76-84. |
[15] | Peiquan JIN. Big data storage management based on new storage [J]. Big Data Research, 2017, 3(5): 70-82. |
Viewed | ||||||
Full text |
|
|||||
Abstract |
|
|||||
|