通信学报 ›› 2021, Vol. 42 ›› Issue (12): 1-16.doi: 10.11959/j.issn.1000-436x.2021229
• 学术论文 • 下一篇
崔栋, 温巧燕, 张华, 王华伟
修回日期:
2021-11-30
出版日期:
2022-12-25
发布日期:
2021-12-01
作者简介:
崔栋(1991- ),男,山东潍坊人,北京邮电大学博士生,主要研究方向为学习索引、移动互联网安全基金资助:
Dong CUI, Qiaoyan WEN, Hua ZHANG, Huawei WANG
Revised:
2021-11-30
Online:
2022-12-25
Published:
2021-12-01
Supported by:
摘要:
为了丰富现有学习多维索引的功能并提高索引效率,提出了可以保留数据分布特征的动态数据分段算法DDSA,并结合四叉树和Z顺序曲线构建了混合空间索引(QML),在此基础上分别设计范围查询算法和KNN查询算法。这种保留数据分布特征的索引可以灵活实现快速查询和更新。实验结果表明,QML 索引在实现丰富功能的前提下优化了检索效率,数据更新的时间复杂度为O(1)。与R*-tree相比,QML索引存储减少约33%,更新效率提升40%~80%。查询效率与最优树形索引相近。
中图分类号:
崔栋, 温巧燕, 张华, 王华伟. QML:一种混合空间索引结构[J]. 通信学报, 2021, 42(12): 1-16.
Dong CUI, Qiaoyan WEN, Hua ZHANG, Huawei WANG. QML: a hybrid spatial index structure[J]. Journal on Communications, 2021, 42(12): 1-16.
表2
学习多维索引比较"
索引名称 | 投影策略/数据布局 | 一维空间学习模型 | 范围查询时间复杂度 | KNN查询时间复杂度 | 插入时间复杂度 | 删除时间复杂度 | 存储空间复杂度 |
ZM | Z顺序曲线 | RMI | O(n) | 不支持 | 不支持 | 不支持 | O(n) |
ML-Index | 基于iDistance的改进策略 | RMI | O(logn) | O(logn) | 不支持 | 不支持 | O(n) |
LISA | 网格划分+基于勒贝格测度的投影 | 分段线性回归模型 | O(n2)+O(n) | O(n logn) | O(1) | O(n) | O(logn)+O(n) |
Flood | 基于成本模型的网格划分 | 分段线性回归模型 | O(n) | 支持(未讨论) | 不支持 | 不支持 | O(n) |
QML | 四叉树+Z顺序曲线 | 基于DDSA构建的分段模型 | O(logn)+O(n) | O(n logn) | O(1) | O(1) | O(logn)+O(n) |
[1] | KRASKA T , BEUTEL A , CHI E H ,et al. The case for learned index structures[C]// Proceedings of the 2018 International Conference on Management of Data. New York:ACM Press, 2018: 489-504. |
[2] | ZHANG H C , ANDERSEN D G , PAVLO A ,et al. Reducing the storage overhead of main-memory OLTP databases with hybrid indexes[C]// Proceedings of the 2016 International Conference on Management of Data. New York:ACM Press, 2016: 1567-1581. |
[3] | GALAKATOS A , MARKOVITCH M , BINNIG C ,et al. FITing-tree:a data-aware index structure[C]// Proceedings of the 2019 International Conference on Management of Data. New York:ACM Press, 2019: 1189-1206. |
[4] | DING J L , MINHAS U F , YU J ,et al. ALEX:an updatable adaptive learned index[C]// Proceedings of the 2020 ACM SIGMOD International Conference on Management of Data. New York:ACM Press, 2020: 969-984. |
[5] | FERRAGINA P , VINCIGUERRA G . The PGM-index:a fully-dynamic compressed learned index with provable worst-case bounds[C]// Proceedings of the VLDB Endowment. Cairo:Morgan Kaufmann, 2020: 1162-1175. |
[6] | TANG C Z , WANG Y Y , DONG Z Y ,et al. XIndex:a scalable learned index for multicore data storage[C]// Proceedings of the 25th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming. New York:ACM Press, 2020: 969-984. |
[7] | 高远宁, 叶金标, 杨念祖 ,等. 基于中间层的可扩展学习索引技术[J]. 软件学报, 2020,31(3): 620-633. |
GAO Y N , YE J B , YANG N Z ,et al. Middle layer based scalable learned index scheme[J]. Journal of Software, 2020,31(3): 620-633. | |
[8] | WANG H X , FU X Y , XU J L ,et al. Learned index for spatial queries[C]// Proceedings of 2019 20th IEEE International Conference on Mobile Data Management (MDM). Piscataway:IEEE Press, 2019: 569-574. |
[9] | DAVITKOVA A , MILCHEVSKI E , MICHEL S . The ML-Index:a multidimensional,learned index for point,range,and nearest-neighbor queries[C]// Proceedings of the 23rd International Conference on Extending Database Technology (EDBT).Copenhagen:OpenProceedings. org, 2020: 407-410. |
[10] | LI P F , LU H , ZHENG Q ,et al. LISA:a learned index structure for spatial data[C]// Proceedings of the 2020 ACM SIGMOD International Conference on Management of Data. New York:ACM Press, 2020: 2119-2133. |
[11] | NATHAN V , DING J L , ALIZADEH M ,et al. Learning multi-dimensional indexes[C]// Proceedings of the 2020 ACM SIGMOD International Conference on Management of Data. New York:ACM Press, 2020: 985-1000. |
[12] | BECKMANN N , KRIEGEL H P , SCHNEIDER R ,et al. The R*-tree:an efficient and robust access method for points and rectangles[J]. ACM SIGMOD Record, 1990,19(2): 322-331. |
[13] | RIGAUX P , SCHOLL M , VOISARD A . An introduction to spatial databases[C]// Spatial Databases. Amsterdam:Elsevier, 2002: 1-28. |
[14] | NIEVERGELT J , HINTERBERGER H , SEVCIK K C . The grid file:an adaptable,symmetric multikey file structure[C]// ACM Transactions on Database Systems. New York:ACM Press, 1984: 38-71. |
[15] | GUTTMAN A , . R-trees:a dynamic index structure for spatial searching[C]// Proceedings of the 1984 ACM SIGMOD international conference on Management of data - SIGMOD’84. New York:ACM Press, 1984: 47-57. |
[16] | BAYER R , . The universal B-tree for multidimensional indexing:general concepts[C]// Lecture Notes in Computer Science. Berlin:Springer, 1997: 198-209. |
[17] | RAMSAK F , MARKL V , FENK R ,et al. Integrating the UB-tree into a database system kernel[C]// Proceedings of the 26th International Conference on Very Large Data Bases. Cairo:Morgan Kaufmann, 2000: 263-272. |
[18] | 张洲, 金培权, 谢希科 . 学习索引:现状与研究展望[J]. 软件学报, 2021,32(4): 1129-1150. |
ZHANG Z , JIN P Q , XIE X K . Learned indexes:current situations and research Prospects[J]. Journal of Software, 2021,32(4): 1129-1150. | |
[19] | 孟小峰, 马超红, 杨晨 . 机器学习化数据库系统研究综述[J]. 计算机研究与发展, 2019,56(9): 1803-1820. |
MENG X F , MA C H , YANG C . Survey on machine learning for database Systems[J]. Journal of Computer Research and Development, 2019,56(9): 1803-1820. | |
[20] | 李国良, 周煊赫, 孙佶 ,等. 基于机器学习的数据库技术综述[J]. 计算机学报, 2020,43(11): 2019-2049. |
LI G L , ZHOU X H , SUN J ,et al. A survey of machine learning based database Techniques[J]. Chinese Journal of Computers, 2020,43(11): 2019-2049. | |
[21] | 柴茗珂, 范举, 杜小勇 . 学习式数据库系统:挑战与机遇[J]. 软件学报, 2020,31(3): 806-830. |
CHAI M K , FAN J , DU X Y . Learnable database systems:challenges and Opportunities[J]. Journal of Software, 2020,31(3): 806-830. | |
[22] | 孙路明, 张少敏, 姬涛 ,等. 人工智能赋能的数据管理技术研究[J]. 软件学报, 2020,31(3): 600-619. |
SUN L M , ZHANG S M , JI T ,et al. Survey of data management techniques powered by artificial Intelligence[J]. Journal of Software, 2020,31(3): 600-619. | |
[23] | JAGADISH H V , OOI B C , TAN K L ,et al. iDistance[J]. ACM Transactions on Database Systems, 2005,30(2): 364-397. |
[24] | GARCIA E K , GUPTA M R . Lattice regression[C]// Proceedings of the 22nd International Conference on Neural Information Processing Systems (NIPS’09).New York:Curran Associates Inc. , 2009: 594-602. |
[25] | GULLO F , PONTI G , TAGARELLI A ,et al. A time series representation model for accurate and fast similarity detection[J]. Pattern Recognition, 2009,42(11): 2998-3014. |
[26] | LIU X Y , LIN Z J , WANG H Q . Novel online methods for time series segmentation[J]. IEEE Transactions on Knowledge and Data Engineering, 2008,20(12): 1616-1626. |
[27] | XU Z H , ZHANG R , KOTAGIRI R ,et al. An adaptive algorithm for online time series segmentation with error bound guarantee[C]// Proceedings of the 15th International Conference on Extending Database Technology - EDBT’12. New York:ACM Press, 2012: 192-203. |
[28] | HAKLAY M , WEBER P . OpenStreetMap:user-generated street maps[J]. IEEE Pervasive Computing, 2008,7(4): 12-18. |
[1] | 付永贵,朱建明. 基于区块链的数据库访问控制机制设计[J]. 通信学报, 2020, 41(5): 130-140. |
[2] | 钟兆根,吴昭军,张立民,王志青. 基于对数符合度下的RSC码识别[J]. 通信学报, 2018, 39(10): 79-86. |
[3] | 王榕,张敏,冯登国,李昊. 数据库形式化安全策略模型建模及分析方法[J]. 通信学报, 2015, 36(9): 193-203. |
[4] | 许佳捷,郑凯,池明旻,朱扬勇,禹晓辉,周晓方. 轨迹大数据:数据、应用与技术现状[J]. 通信学报, 2015, 36(12): 97-105. |
[5] | 温涛,张玉清,刘奇旭,杨刚. UVDA:自动化融合异构安全漏洞库框架的设计与实现[J]. 通信学报, 2015, 36(10): 235-244. |
[6] | 李晓娜,李庆志,孔兰菊,闫中敏. SaaS多租户数据放置的优化调整策略研究[J]. 通信学报, 2014, 35(Z2): 63-71. |
[7] | 李晓娜,李庆忠,孔兰菊,闫中敏. SaaS多租户数据放置的优化调整策略研究[J]. 通信学报, 2014, 35(Z2): 10-71. |
[8] | 田 园,孙荣辛,朱学勇. GUC安全的关系联结算子保密计算协议[J]. 通信学报, 2014, 35(11): 12-106. |
[9] | 田园,孙荣辛,蔡悟洋. GUC安全的关系联结算子保密计算协议[J]. 通信学报, 2014, 35(11): 107-116. |
[10] | 李晓娜,李庆忠,孔兰菊,庞成. 基于共享模式的SaaS多租户数据划分机制研究[J]. 通信学报, 2012, 33(Z1): 110-120. |
[11] | 崔纪锋,张勇,李超,邢春晓. 面向云存储的空间索引技术研究[J]. 通信学报, 2011, 32(9A): 34-41. |
[12] | 李娇,刘全,傅启明,王庭钢. 分布式数据库中基于局部CO N模型的记录匹配方法[J]. 通信学报, 2011, 32(7): 196-202. |
[13] | 张颖君,冯登国,陈恺. 面向空间索引树的授权机制[J]. 通信学报, 2010, 31(9): 65-74. |
[14] | 戴华,秦小麟,刘亮,柏传杰. 基于OCAR挖掘的数据库异常检测模型[J]. 通信学报, 2009, 30(9): 7-14. |
[15] | 曾海涛,王永吉,阮利,祖伟,蔡嘉勇. 使用容量指标的安全实时数据库信道限制方法[J]. 通信学报, 2008, 29(8): 47-57. |
阅读次数 | ||||||
全文 |
|
|||||
摘要 |
|
|||||
|