大数据

• •    

基于更新热点感知的LSM-Tree查询优化

林清音, 陈志广   

  1. 中山大学计算机学院,广东 广州 510006

  • 作者简介:林清音(1999‒ ),女,中山大学计算机技术硕士生,主要研究方向为存储系统。 陈志广(1984‒ ),男,博士,中山大学计算机学院副教授,主要研究方向为大数据存储与处理、并行与分布式计算、高性能计算与超级计算机。在并行文件系统、大规模并行IO优化、大数据分析处理方面取得了关键技术突破。

A Hot-update-aware Optimization to the Query of LSM-Tree

Qingyin LIN,  Zhiguang CHEN   

  1. School of Computer Science and Engineering, Sun Yat-Sen University, Guangzhou 510006, China

摘要:

基于LSM-Tree的键值存储已经得到广泛使用。LSM-Tree通过将更新的数据缓存在内存中、随后批量写入磁盘的I/O优化措施来取得极高的写性能。然而,在基于LSM-Tree的键值存储中,被更新键值对的旧数据不会立即从存储系统中清除,导致整个存储系统中会积累大量的无效数据,最终会显著降低键值存储的读性能。针对以上问题,提出一种更积极的Compaction方法,通过记录键值对更新的历史信息,识别出更新热点,在整个LSM-Tree存储系统中寻找无效数据大量聚集的SSTable,及早地实施Compaction,清除其中的无效数据,缓解写放大效应,从而提升读性能。实验表明,该方法能够降低LevelDB 65.2%的读平均延迟,69.4%的读尾延迟以及71.4%的写放大。

关键词:

"> 键值存储;日志结构合并树;读性能优化;写放大

Abstract:

Key-value stores based on LSM-Tree has been widely used. LSM-Tree gains excellent write performance by collecting updated data in memory and then flush data into storage in batches. However, in LSM-Tree based key-value stores, old data generated by update operations will not be eliminated immediately from the storage system, resulting in a large amount of invalid data accumulated in the entire srotage system, which will eventually significantly reduce the read performance of key-value stores. For the above problems,  an active compaction method was proposed. By recording the history information of updated key-value pairs, recognizing hot-updated keys, finding SSTables that contain large amount of invalid data in the storage system, and triggering compaction as soon as possible to clear much more invalid data, the proposed method can reduce write amplification and improve read performance of LSM-Tree based key-value stores. Experiments show that this method can reduce average read latency of LevelDB by 65.2%, 99 percent read tail latency by 69.4%, and write amplification by 71.4%.

No Suggested Reading articles found!