大数据 ›› 2019, Vol. 5 ›› Issue (4): 3-15.doi: 10.11959/j.issn.2096-0271.2019028
严赵峰,张为华
出版日期:
2019-07-15
发布日期:
2019-08-09
作者简介:
严赵峰(1993- ),男,复旦大学并行处理研究所硕士生,主要研究方向为异构计算。|张为华(1974- ),男,博士,复旦大学并行处理研究所教授,主要研究方向为编译器、系统软件、并行处理等。
Zhaofeng YAN,Weihua ZHANG
Online:
2019-07-15
Published:
2019-08-09
摘要:
为了保证各类大数据系统的性能,以并发索引结构为核心的高效检索技术变得越来越重要。然而,数据存储体量的增加和用户对性能要求的提高使得并发索引结构面临诸多挑战,设计高效且易用的并发控制策略和提升索引结构操作性能成为系统领域关心的重要问题。结合现有的研究成果,综合分析了大数据时代并发索引结构的研究进展。首先讨论了关于设计更优化且易用的并发控制策略的研究现状,接着针对各类新型高性能硬件,探讨了基于新硬件结构特点的优化加速研究成果,并对可能的发展方向进行了展望。
中图分类号:
严赵峰, 张为华. 面向大数据的索引结构研究进展[J]. 大数据, 2019, 5(4): 3-15.
Zhaofeng YAN, Weihua ZHANG. A survey of index structure in big data era[J]. Big Data Research, 2019, 5(4): 3-15.
[1] | GRAEFE G , KUNO H . Modern B-tree techniques[J]. Foundations & Trends in Databases, 2011,3(4): 203-402. |
[2] | SHAHVARANI A , JACOBSEN H A . A hybrid B+-tree as solution for in-memory indexing on CPU-GPU heterogeneous computing platforms[C]// The 2016 International Conference on Management of Data,June 26-July 1,2016,San Francisco,USA. New York:ACM Press, 2016: 1523-1538. |
[3] | VAISMAN A , ZIMáNYI E . Data warehouses:next challenges[J]. Lecture Notes in Business Information Processing, 2011,96: 1-26. |
[4] | VASSILIADIS P , SIMITSIS A . Near real time ETL[M]// New trends in data warehousing and data analysis. Boston:Springer, 2009: 1-31. |
[5] | BAYER R , SCHKOLNICK M . Concurrency of operations on B-trees[J]. Acta Informatica, 1977,9(1): 1-21. |
[6] | LEHMAN P L . Efficient locking for concurrent operations on B-trees[J]. ACM Transactions on Database Systems (TODS), 1981,6(4): 650-670. |
[7] | M AO Y , KOHLER E , MORRIS R T . Cache craftiness for fast multicore key-value storage[C]// The 7th ACM European Conference on Computer Systems,April 10-13,2012,Bern,Switzerland. New York:ACM Press, 2012: 183-196. |
[8] | CHA S K , HWANG S , KIM K ,et al. Cache-conscious concurrency control of main-memory indexes on shared-memory multiprocessor systems[C]// The 27th International Conference on Very Large Data Bases,September 11-14,2001,Roma,Italy.San Francisco:Morgan Kaufmann Publishers Inc. , 2001: 181-190. |
[9] | LEVANDOSKI J J , LOMET D B , SENGUPTA S . The Bw-Tree:a B-tree for new hardware platforms[C]// 2013 IEEE 29th International Conference on Data Engineering (ICDE),April 8-12,2013,Brisbane,Australia. Piscataway:IEEE Press, 2013: 302-313. |
[10] | SEWALL J , CHHUGANI J , KIM C ,et al. PALM:parallel architecture-friendly latchfree modifications to B+ trees on manycore processors[J]. Proceedings of the VLDB Endowment, 2011,4(11): 795-806. |
[11] | CHEN Y , WEI X , SHI J ,et al. Fast and general distributed transactions using RDMA and HTM[C]// The 11th European Conference on Computer Systems,April 18-21,2016,London,UK. New York:ACM Press, 2016. |
[12] | WANG X , ZHANG W , WANG Z ,et al. Eunomia:scaling concurrent search trees under contention using HTM[C]// The 22nd ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming,February 4-8,2017,Austin,USA. New York:ACM Press, 2017: 385-399. |
[13] | ZHANG W , WANG X , JI S ,et al. Scaling concurrent index structures under contention using HTM[J]. IEEE Transactions on Parallel and Distributed Systems, 2018,29(8): 1837-1850. |
[14] | WANG Z , QIAN H , CHEN H ,et al. Opportunities and pitfalls of multicore scaling using hardware transaction memory[C]// The 4th Asia-Pacific Workshop on Systems,July 29-30,2013,Singapore. New York:ACM Press, 2013. |
[15] | WANG Z , QIAN H , LI J ,et al. Using restricted transactional memory to build a scalable in-memory database[C]// The 9th European Conference on Computer Systems,April 14-16,2014,Amsterdam,the Netherlands. New York:ACM Press, 2014. |
[16] | CHEN S , PENG L . Efficient GPU hardware transactional memory through early conflict resolution[C]// 2016 IEEE International Symposium on High Performance Computer Architecture(HPCA),March 12-16,2016,Barcelona,Spain. Piscataway:IEEE Press, 2016: 274-284. |
[17] | HERLIHY M , MOSS J E B . Transactional memory:architectural support for lockfree data structures[M]. New York: ACM PressPress, 1993. |
[18] | XU Y , WANG R , GOSWAMI N ,et al. Software transactional memory for GPU architectures[C]// Annual IEEE/ACM International Symposium on Code Generation and Optimization,February 15-19,2014,Orlando,USA. New York:ACM Press, 2014. |
[19] | POLLARI-MALMI K , SOISALONSOININEN E , YLONEN T . Concurrency control in B-trees with batch updates[J]. IEEE Transactions on Knowledge and Data Engineering, 1996,8(6): 975-984. |
[20] | SHNEIDERMAN B . Batched searching of sequential and tree structured files[J]. ACM Transactions on Database Systems (TODS), 1976,1(3): 268-275. |
[21] | ARGE L , HINRICHS K H , VAHRENHOLD J ,et al. Efficient bulk operations on dynamic R-trees[J]. Algorithmica, 2002,33(1): 104-128. |
[22] | GRAEFE G . B-tree indexes for high update rates[J]. ACM Sigmod Record, 2006,35(1): 39-44. |
[23] | YAN Z , LIN Y , PENG L ,et al. Harmonia:a high throughput B+ tree for GPUs[C]// The 24th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming(PPoPP),February 16-20,2019,Washington,DC,USA. New York:ACM Press, 2019: 133-144. |
[24] | AWAD M A , ASHKIANI S , JOHNSON R ,et al. Engineering a high-performance GPU B-Tree[C]// The 24th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming(PPoPP),February 16-20,2019,Washington,DC,USA. New York:ACM Press, 2019. |
[25] | RAO J , ROSS K A . Cache conscious indexing for decision-support in main memory[C]// The 25th International Conference on Very Large Data Bases,September 7-10,1999,Edinburgh,Scotland.San Francisco:Morgan Kaufmann Publishers Inc. , 1999: 78-89. |
[26] | RAO J , ROSS K A . Making B+-trees cache conscious in main memory[C]// The 2000 ACM SIGMOD International Conference on Management of Data,May 15-18,2000,Dallas,USA. New York:ACM Press, 2000: 475-486. |
[27] | CHEN S , GIBBONS P B , MOWRY T C . Improving index performance through prefetching[M]. New York: ACM PressPress, 2001. |
[28] | CHEN S , GIBBONS P B , MOWRY T C ,et al. Fractal prefetching B+trees:optimizing both cache and disk performance[C]// The 2002 ACM SIGMOD International Conference on Management of Data,June 3-6,2002,Madison,USA. New York:ACM Press, 2002: 157-168. |
[29] | HANKINS R A , PATEL J M . Effect of node size on the performance of cache-conscious B+-trees[C]// The 2003 ACM SIGMETRICS International Conference on Measurement and Modeling of Computer Systems,June 11-14,2003,San Diego,USA. New York:ACM Press, 2003: 283-294. |
[30] | KIM C , CHHUGANI J , SATISH N ,et al. FAST:fast architecture sensitive tree search on modern CPUs and GPUs[C]// The 2010 ACM SIGMOD International Conference on Management of Data,June 6-10,2010,Indianapolis,USA. New York:ACM Press, 2010: 339-350. |
[31] | KACZMARSKI K . Experimental B+-tree for GPU[J]. Advances in Databases and Information Systems, 2011(2):11. |
[32] | KACZMARSKI K , . B+-tree optimized for GPGPU[C]// OTM Confederated International Conferences “On the Move to Meaningful Internet Systems”,September 10-14,2012,Rome,Italy. Heidelberg:Springer, 2012: 843-854. |
[33] | DAGA M , NUTTER M . Exploiting coarsegrained parallelism in b+ tree searches on an apu[C]// 2012 SC Companion:High Performance Computing,Networking,Storage and Analysis(SCC),November 10-16,2012,Salt Lake City,USA. Piscataway:IEEE Press, 2012: 240-247. |
[34] | GRAEFE G . A survey of B-tree locking techniques[J]. ACM Transactions on Database Systems (TODS), 2010,35(3):16. |
[35] | YANG Y H E , PRASANNA V K . High throughput and large capacity pipelined dynamic search tree on FPGA[C]// The 18th Annual ACM/SIGDA International Symposium on Field Programmable Gate Arrays,February 21-23,2010,Monterey,USA. New York:ACM Press, 2010: 83-92. |
[36] | HEINRICH D , WERNER S , STELZNER M ,et al. Hybrid FPGA approach for a B+ tree in a semantic web database system[C]// 2015 10th International Symposium on Reconfigurable Communication-centric Systems-on-Chip (ReCoSoC),June 29-July 1,2015,Bremen,Germany. Piscataway:IEEE Press, 2015: 1-8. |
[37] | QU Y , PRASANNA V . Scalable highthroughput architecture for large balancedtree structures on FPGA[C]// The ACM/SIGDA International Symposium on Field Programmable Gate Arrays,February 11-13,2013,Monterey,USA. New York:ACM Press, 2013:278. |
[38] | WEI X , SHI J , CHEN Y ,et al. Fast inmemory transaction processing using RDMA and HTM[C]// The 25th Symposium on Operating Systems Principles,October 4-7,2015,Monterey,USA. New York:ACM Press, 2015: 87-104. |
[39] | MITCHELL C , MONTGOMERY K , NELSON L ,et al. Balancing CPU and network in the cell distributed B-tree store[C]// 2016 USENIX Annual Technical Conference,June 22-24,2016,Denver,USA. Berkeley:USENIX Association, 2016: 451-464. |
[40] | CHEN S M , QIN J . Persistent b+-trees in non-volatile main memory[J]. Proceedings of the VLDB Endowment, 2015,8(7): 786-797. |
[41] | CHEN S M , GIBBONS P B , NATH S . Rethinking database algorithms for phase change memory[C]// The 5th Biennial Conference on Innovative Data Systems Research,January 9-12,2011,Asilomar,USA.[S.l.:s.n]. 2011. |
[42] | YANG J , WEI Q S , CHEN C ,et al. NVTree:reducing consistency cost for NVMbased single level systems[C]// The 13th USENIX Conference on File and Storage Technologies(FAST15),February 16-19,2015,Santa Clara,USA. Berkeley:USENIX Association, 2015: 167-181. |
[43] | OUKID I , LASPERAS J , NICA A ,et al. FPTree:a hybrid SCM-DRAM persistent and concurrent B-tree for storage class memory[C]// The 2016 International Conference on Management of Data,June 26-July 1,2016,San Francisco,USA. New York:ACM Press, 2016. |
[44] | ARULRAJ J , . BzTree:a high-performance latch-free range index for non-volatile memory[C]// The 44th International Conference on Very Large Data Bases,August 27-31,2018,Rio De Janeiro,Brazil. New York:ACM Press, 2018: 553-565. |
[1] | 谢冰, 彭鑫, 尹刚, 李宣东, 魏峻, 孙海龙. 基于大数据的软件智能化开发方法与环境[J]. 大数据, 2021, 7(1): 0-. |
[2] | 张建, 孟祥鑫, 孙海龙, 王旭, 刘旭东. 数据驱动的软件开发者智能协作技术[J]. 大数据, 2021, 7(1): 0-. |
[3] | 张洋, 王涛, 尹刚, 余跃, 黄井泉. 面向智能化软件开发的开源生态大数据[J]. 大数据, 2021, 7(1): 0-. |
[4] | 李文明, 刘芳, 吕鹏, 于彦伟. 基于城市交通监控大数据的行程时间估计[J]. 大数据, 2021, 7(1): 0-. |
[5] | 李刚, 郑佳, 尹华山, 黄文超. 大数据技术在疫情精准防控中的应用[J]. 大数据, 2021, 7(1): 0-. |
[6] | 柴唤友,刘三女牙,康令云,张雅娴,李卿,刘智. 教育大数据采集机制与关键技术研究[J]. 大数据, 2020, 6(6): 0-. |
[7] | 乐洁玉,罗超洋,丁静姝,李卿. 教育大数据隐私保护机制与技术研究[J]. 大数据, 2020, 6(6): 0-. |
[8] | 王健宗,孔令炜,黄章成,陈霖捷,刘懿,何安珣,肖京. 联邦学习算法综述[J]. 大数据, 2020, 6(6): 0-. |
[9] | 王晓萍,郭梦洁,岳婧雯. 基于关系图谱的人岗关系研究[J]. 大数据, 2020, 6(6): 0-. |
[10] | 柴扬帆,孔桂兰,张路霞. 医疗大数据在学习型健康医疗系统中的应用[J]. 大数据, 2020, 6(5): 0-. |
[11] | 代红,张群,芦皓麟,宾军志. 银行业金融机构数据治理指引和DCMM的对比分析[J]. 大数据, 2020, 6(5): 0-. |
[12] | 何晓斌,蒋金虎. 面向大数据异构系统的神威并行存储系统[J]. 大数据, 2020, 6(4): 0-. |
[13] | 胡正丁,薛巍. 面向异构众核超级计算机的大规模稀疏计算性能优化研究[J]. 大数据, 2020, 6(4): 0-. |
[14] | 夏大文,王林,张乾,魏嘉银,冯夫健,李华青. 大数据应用技术课程教学改革与实践[J]. 大数据, 2020, 6(4): 0-. |
[15] | 叶雅珍,刘国华,朱扬勇. 数据资产化框架初探[J]. 大数据, 2020, 6(3): 0-. |
阅读次数 | ||||||
全文 |
|
|||||
摘要 |
|
|||||
|