大数据 ›› 2019, Vol. 5 ›› Issue (4): 3-15.doi: 10.11959/j.issn.2096-0271.2019028
严赵峰,张为华
出版日期:
2019-07-15
发布日期:
2019-08-09
作者简介:
严赵峰(1993- ),男,复旦大学并行处理研究所硕士生,主要研究方向为异构计算。|张为华(1974- ),男,博士,复旦大学并行处理研究所教授,主要研究方向为编译器、系统软件、并行处理等。
Zhaofeng YAN,Weihua ZHANG
Online:
2019-07-15
Published:
2019-08-09
摘要:
为了保证各类大数据系统的性能,以并发索引结构为核心的高效检索技术变得越来越重要。然而,数据存储体量的增加和用户对性能要求的提高使得并发索引结构面临诸多挑战,设计高效且易用的并发控制策略和提升索引结构操作性能成为系统领域关心的重要问题。结合现有的研究成果,综合分析了大数据时代并发索引结构的研究进展。首先讨论了关于设计更优化且易用的并发控制策略的研究现状,接着针对各类新型高性能硬件,探讨了基于新硬件结构特点的优化加速研究成果,并对可能的发展方向进行了展望。
中图分类号:
严赵峰, 张为华. 面向大数据的索引结构研究进展[J]. 大数据, 2019, 5(4): 3-15.
Zhaofeng YAN, Weihua ZHANG. A survey of index structure in big data era[J]. Big Data Research, 2019, 5(4): 3-15.
[1] | GRAEFE G , KUNO H . Modern B-tree techniques[J]. Foundations & Trends in Databases, 2011,3(4): 203-402. |
[2] | SHAHVARANI A , JACOBSEN H A . A hybrid B+-tree as solution for in-memory indexing on CPU-GPU heterogeneous computing platforms[C]// The 2016 International Conference on Management of Data,June 26-July 1,2016,San Francisco,USA. New York:ACM Press, 2016: 1523-1538. |
[3] | VAISMAN A , ZIMáNYI E . Data warehouses:next challenges[J]. Lecture Notes in Business Information Processing, 2011,96: 1-26. |
[4] | VASSILIADIS P , SIMITSIS A . Near real time ETL[M]// New trends in data warehousing and data analysis. Boston:Springer, 2009: 1-31. |
[5] | BAYER R , SCHKOLNICK M . Concurrency of operations on B-trees[J]. Acta Informatica, 1977,9(1): 1-21. |
[6] | LEHMAN P L . Efficient locking for concurrent operations on B-trees[J]. ACM Transactions on Database Systems (TODS), 1981,6(4): 650-670. |
[7] | M AO Y , KOHLER E , MORRIS R T . Cache craftiness for fast multicore key-value storage[C]// The 7th ACM European Conference on Computer Systems,April 10-13,2012,Bern,Switzerland. New York:ACM Press, 2012: 183-196. |
[8] | CHA S K , HWANG S , KIM K ,et al. Cache-conscious concurrency control of main-memory indexes on shared-memory multiprocessor systems[C]// The 27th International Conference on Very Large Data Bases,September 11-14,2001,Roma,Italy.San Francisco:Morgan Kaufmann Publishers Inc. , 2001: 181-190. |
[9] | LEVANDOSKI J J , LOMET D B , SENGUPTA S . The Bw-Tree:a B-tree for new hardware platforms[C]// 2013 IEEE 29th International Conference on Data Engineering (ICDE),April 8-12,2013,Brisbane,Australia. Piscataway:IEEE Press, 2013: 302-313. |
[10] | SEWALL J , CHHUGANI J , KIM C ,et al. PALM:parallel architecture-friendly latchfree modifications to B+ trees on manycore processors[J]. Proceedings of the VLDB Endowment, 2011,4(11): 795-806. |
[11] | CHEN Y , WEI X , SHI J ,et al. Fast and general distributed transactions using RDMA and HTM[C]// The 11th European Conference on Computer Systems,April 18-21,2016,London,UK. New York:ACM Press, 2016. |
[12] | WANG X , ZHANG W , WANG Z ,et al. Eunomia:scaling concurrent search trees under contention using HTM[C]// The 22nd ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming,February 4-8,2017,Austin,USA. New York:ACM Press, 2017: 385-399. |
[13] | ZHANG W , WANG X , JI S ,et al. Scaling concurrent index structures under contention using HTM[J]. IEEE Transactions on Parallel and Distributed Systems, 2018,29(8): 1837-1850. |
[14] | WANG Z , QIAN H , CHEN H ,et al. Opportunities and pitfalls of multicore scaling using hardware transaction memory[C]// The 4th Asia-Pacific Workshop on Systems,July 29-30,2013,Singapore. New York:ACM Press, 2013. |
[15] | WANG Z , QIAN H , LI J ,et al. Using restricted transactional memory to build a scalable in-memory database[C]// The 9th European Conference on Computer Systems,April 14-16,2014,Amsterdam,the Netherlands. New York:ACM Press, 2014. |
[16] | CHEN S , PENG L . Efficient GPU hardware transactional memory through early conflict resolution[C]// 2016 IEEE International Symposium on High Performance Computer Architecture(HPCA),March 12-16,2016,Barcelona,Spain. Piscataway:IEEE Press, 2016: 274-284. |
[17] | HERLIHY M , MOSS J E B . Transactional memory:architectural support for lockfree data structures[M]. New York: ACM PressPress, 1993. |
[18] | XU Y , WANG R , GOSWAMI N ,et al. Software transactional memory for GPU architectures[C]// Annual IEEE/ACM International Symposium on Code Generation and Optimization,February 15-19,2014,Orlando,USA. New York:ACM Press, 2014. |
[19] | POLLARI-MALMI K , SOISALONSOININEN E , YLONEN T . Concurrency control in B-trees with batch updates[J]. IEEE Transactions on Knowledge and Data Engineering, 1996,8(6): 975-984. |
[20] | SHNEIDERMAN B . Batched searching of sequential and tree structured files[J]. ACM Transactions on Database Systems (TODS), 1976,1(3): 268-275. |
[21] | ARGE L , HINRICHS K H , VAHRENHOLD J ,et al. Efficient bulk operations on dynamic R-trees[J]. Algorithmica, 2002,33(1): 104-128. |
[22] | GRAEFE G . B-tree indexes for high update rates[J]. ACM Sigmod Record, 2006,35(1): 39-44. |
[23] | YAN Z , LIN Y , PENG L ,et al. Harmonia:a high throughput B+ tree for GPUs[C]// The 24th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming(PPoPP),February 16-20,2019,Washington,DC,USA. New York:ACM Press, 2019: 133-144. |
[24] | AWAD M A , ASHKIANI S , JOHNSON R ,et al. Engineering a high-performance GPU B-Tree[C]// The 24th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming(PPoPP),February 16-20,2019,Washington,DC,USA. New York:ACM Press, 2019. |
[25] | RAO J , ROSS K A . Cache conscious indexing for decision-support in main memory[C]// The 25th International Conference on Very Large Data Bases,September 7-10,1999,Edinburgh,Scotland.San Francisco:Morgan Kaufmann Publishers Inc. , 1999: 78-89. |
[26] | RAO J , ROSS K A . Making B+-trees cache conscious in main memory[C]// The 2000 ACM SIGMOD International Conference on Management of Data,May 15-18,2000,Dallas,USA. New York:ACM Press, 2000: 475-486. |
[27] | CHEN S , GIBBONS P B , MOWRY T C . Improving index performance through prefetching[M]. New York: ACM PressPress, 2001. |
[28] | CHEN S , GIBBONS P B , MOWRY T C ,et al. Fractal prefetching B+trees:optimizing both cache and disk performance[C]// The 2002 ACM SIGMOD International Conference on Management of Data,June 3-6,2002,Madison,USA. New York:ACM Press, 2002: 157-168. |
[29] | HANKINS R A , PATEL J M . Effect of node size on the performance of cache-conscious B+-trees[C]// The 2003 ACM SIGMETRICS International Conference on Measurement and Modeling of Computer Systems,June 11-14,2003,San Diego,USA. New York:ACM Press, 2003: 283-294. |
[30] | KIM C , CHHUGANI J , SATISH N ,et al. FAST:fast architecture sensitive tree search on modern CPUs and GPUs[C]// The 2010 ACM SIGMOD International Conference on Management of Data,June 6-10,2010,Indianapolis,USA. New York:ACM Press, 2010: 339-350. |
[31] | KACZMARSKI K . Experimental B+-tree for GPU[J]. Advances in Databases and Information Systems, 2011(2):11. |
[32] | KACZMARSKI K , . B+-tree optimized for GPGPU[C]// OTM Confederated International Conferences “On the Move to Meaningful Internet Systems”,September 10-14,2012,Rome,Italy. Heidelberg:Springer, 2012: 843-854. |
[33] | DAGA M , NUTTER M . Exploiting coarsegrained parallelism in b+ tree searches on an apu[C]// 2012 SC Companion:High Performance Computing,Networking,Storage and Analysis(SCC),November 10-16,2012,Salt Lake City,USA. Piscataway:IEEE Press, 2012: 240-247. |
[34] | GRAEFE G . A survey of B-tree locking techniques[J]. ACM Transactions on Database Systems (TODS), 2010,35(3):16. |
[35] | YANG Y H E , PRASANNA V K . High throughput and large capacity pipelined dynamic search tree on FPGA[C]// The 18th Annual ACM/SIGDA International Symposium on Field Programmable Gate Arrays,February 21-23,2010,Monterey,USA. New York:ACM Press, 2010: 83-92. |
[36] | HEINRICH D , WERNER S , STELZNER M ,et al. Hybrid FPGA approach for a B+ tree in a semantic web database system[C]// 2015 10th International Symposium on Reconfigurable Communication-centric Systems-on-Chip (ReCoSoC),June 29-July 1,2015,Bremen,Germany. Piscataway:IEEE Press, 2015: 1-8. |
[37] | QU Y , PRASANNA V . Scalable highthroughput architecture for large balancedtree structures on FPGA[C]// The ACM/SIGDA International Symposium on Field Programmable Gate Arrays,February 11-13,2013,Monterey,USA. New York:ACM Press, 2013:278. |
[38] | WEI X , SHI J , CHEN Y ,et al. Fast inmemory transaction processing using RDMA and HTM[C]// The 25th Symposium on Operating Systems Principles,October 4-7,2015,Monterey,USA. New York:ACM Press, 2015: 87-104. |
[39] | MITCHELL C , MONTGOMERY K , NELSON L ,et al. Balancing CPU and network in the cell distributed B-tree store[C]// 2016 USENIX Annual Technical Conference,June 22-24,2016,Denver,USA. Berkeley:USENIX Association, 2016: 451-464. |
[40] | CHEN S M , QIN J . Persistent b+-trees in non-volatile main memory[J]. Proceedings of the VLDB Endowment, 2015,8(7): 786-797. |
[41] | CHEN S M , GIBBONS P B , NATH S . Rethinking database algorithms for phase change memory[C]// The 5th Biennial Conference on Innovative Data Systems Research,January 9-12,2011,Asilomar,USA.[S.l.:s.n]. 2011. |
[42] | YANG J , WEI Q S , CHEN C ,et al. NVTree:reducing consistency cost for NVMbased single level systems[C]// The 13th USENIX Conference on File and Storage Technologies(FAST15),February 16-19,2015,Santa Clara,USA. Berkeley:USENIX Association, 2015: 167-181. |
[43] | OUKID I , LASPERAS J , NICA A ,et al. FPTree:a hybrid SCM-DRAM persistent and concurrent B-tree for storage class memory[C]// The 2016 International Conference on Management of Data,June 26-July 1,2016,San Francisco,USA. New York:ACM Press, 2016. |
[44] | ARULRAJ J , . BzTree:a high-performance latch-free range index for non-volatile memory[C]// The 44th International Conference on Very Large Data Bases,August 27-31,2018,Rio De Janeiro,Brazil. New York:ACM Press, 2018: 553-565. |
[1] | 钱海红, 王茂异, 熊贇. 高等教育数字化转型的现状与发展研究[J]. 大数据, 2023, 9(3): 56-70. |
[2] | 梅宏, 杜小勇, 金海, 程学旗, 柴云鹏, 石宣化, 靳小龙, 王亚沙, 刘驰. 大数据技术前瞻[J]. 大数据, 2023, 9(1): 1-20. |
[3] | 沈阳, 余梦珑. 元宇宙与大数据:时空智能中的数据洞察与价值连接[J]. 大数据, 2023, 9(1): 103-110. |
[4] | 郑童哲恒, 李斌, 冯敏萱, 常博林, 王东波. 历史典籍的结构化探索——《史记·列传》数字人文知识库的构建与可视化研究[J]. 大数据, 2022, 8(6): 40-55. |
[5] | 陈静. 人文大数据及其在数字人文领域中的应用[J]. 大数据, 2022, 8(6): 3-14. |
[6] | 罗煜楚, 吴昊, 郭宇涵, 谭绍聪, 刘灿, 蒋瑞珂, 袁晓如. 数字人文中的可视化[J]. 大数据, 2022, 8(6): 74-93. |
[7] | 李汶龙, 袁媛, 安筱鹏. 刍议大数据治理的三大基础思维[J]. 大数据, 2022, 8(4): 34-45. |
[8] | 汤奇峰, 邵志清, 叶雅珍. 数据交易中的权利确认和授予体系[J]. 大数据, 2022, 8(3): 40-53. |
[9] | 王陈慧子, 蔡玮. 元宇宙数字经济:现状、特征与发展建议[J]. 大数据, 2022, 8(3): 140-150. |
[10] | 杨玫, 李玮, 乔思渊, 刘巍. 中国大数据产业产值测算方法研究[J]. 大数据, 2022, 8(3): 151-160. |
[11] | 李德仁, 张过, 蒋永华, 沈欣, 刘伟玲. 论大数据视角下的地球空间信息学的机遇与挑战[J]. 大数据, 2022, 8(2): 3-14. |
[12] | 仇晓兰, 胡玉新, 上官松涛, 付琨. 遥感卫星大数据高精度一体化处理技术[J]. 大数据, 2022, 8(2): 15-27. |
[13] | 刘伟权, 王程, 臧彧, 胡倩, 于尚书, 赖柏锜. 基于遥感大数据的信息提取技术综述[J]. 大数据, 2022, 8(2): 28-57. |
[14] | 刘建强, 叶小敏, 兰友国. 我国海洋卫星遥感大数据及其应用服务[J]. 大数据, 2022, 8(2): 75-88. |
[15] | 杨何群, 王晓峰, 高彦青, 陆一闻, 麻炳欣, 王昕瑶. 数值天气预报对卫星大数据的需求分析[J]. 大数据, 2022, 8(2): 89-102. |
阅读次数 | ||||||
全文 |
|
|||||
摘要 |
|
|||||
|