大数据 ›› 2023, Vol. 9 ›› Issue (1): 126-140.doi: 10.11959/j.issn.2096-0271.2022049

• 研究 • 上一篇    下一篇

基于更新热点感知的LSM-Tree查询优化

林清音, 陈志广   

  1. 中山大学计算机学院,广东 广州 510006
  • 出版日期:2023-01-15 发布日期:2023-01-01
  • 作者简介:林清音(1999- ),女,中山大学计算机学院硕士生,主要研究方向为存储系统
    陈志广(1984- ),男,博士,中山大学计算机学院副教授,主要研究方向为大数据存储与处理、并行与分布式计算、高性能计算与超级计算机
  • 基金资助:
    国家重点研发计划基金资助项目(2021YFB0300103);国家自然科学基金资助项目(61872392);国家自然科学基金资助项目(61832020);国家自然科学基金资助项目(U1911401);广东省自然科学基金资助项目(2018B030312002)

A hot-update-aware optimization to the query of LSM-Tree

Qingyin LIN, Zhiguang CHEN   

  1. School of Computer Science and Engineering, Sun Yat-Sen University, Guangzhou 510006, China
  • Online:2023-01-15 Published:2023-01-01
  • Supported by:
    The National Key Research and Development Program of China(2021YFB0300103);The National Natural Science Foundation of China(61872392);The National Natural Science Foundation of China(61832020);The National Natural Science Foundation of China(U1911401);The Natural Science Foundation of Guangdong Province(2018B030312002)

摘要:

基于LSM-Tree的键值存储已经得到广泛使用。LSM-Tree通过将更新的数据缓存在内存中、随后批量写入磁盘的优化措施取得极高的写性能。然而,在基于LSM-Tree的键值存储中,被更新键值对的旧数据不会立即从存储系统中清除,导致整个存储系统中积累大量的无效数据,最终会显著降低键值存储的读性能。针对以上问题,提出一种更积极的压缩(compaction)方法,通过记录键值对更新的历史信息,识别出更新热点,在整个LSM-Tree存储系统中寻找无效数据大量聚集的SSTable,尽早实施压缩,清除无效数据,缓解写放大效应,从而提升读性能。实验表明,该方法能够降低LevelDB 65.2%的平均读时延、69.4%的99%读尾时延以及71.4%的写放大。

关键词: 键值存储, 日志结构合并树, 读性能优化, 写放大

Abstract:

Key-value stores based on LSM-Tree have been widely used.LSM-Tree gains excellent write performance by collecting updated data in memory and then flushing data into storage in batches.However, in LSMTree-based key-value stores, old data generated by update operations will not be eliminated immediately from the storage system, resulting in a large amount of invalid data accumulated in the entire storage system, which will eventually significantly reduce the read performance of key-value stores.For the above problems, an active compaction method was proposed.By recording the history information of updated key-value pairs, recognizing hot-updated keys, finding SSTables that contain a large amount of invalid data in the storage system, and triggering compaction as soon as possible to clear much more invalid data, the proposed method could reduce write amplification and improve the read performance of LSM-Tree based key-value stores.Experiments showed that this method could reduce the average read latency of LevelDB by 65.2%, 99% read tail latency by 69.4%, and write amplification by 71.4%.

Key words: key-value stores, log-structured merge tree, read performance optimization, write amplification

中图分类号: 

No Suggested Reading articles found!