通信学报 ›› 2015, Vol. 36 ›› Issue (Z1): 197-202.doi: 10.11959/j.issn.1000-436x.2015300

• 学术论文 • 上一篇    下一篇

基于差分压缩的大规模日志压缩系统

唐球1,2,3,姜磊1,2,戴琼1,2   

  1. 1 信息内容安全技术国家工程实验室,北京100093
    2 中国科学院 信息工程研究所,北京 100093
    3 中国科学院大学,北京 100049
  • 出版日期:2015-11-25 发布日期:2015-12-29
  • 基金资助:
    中科院战略性先导科技专项基金资助项目

Large-scale log compressing system based on differential compression

Qiu TANG1,2,3,Lei JIANG1,2,Qiong DAI1,2   

  1. 1 National Engineering Laboratory for Information Security Technologies,Beijing 100093,China
    2 Institute of Information Engineering,Chinese Academy of Sciences,Beijing 100093,China
    3 University of Chinese Academy of Sciences,Beijing 100049
  • Online:2015-11-25 Published:2015-12-29
  • Supported by:
    Special Pilot Research of the Chinese Academy of Sciences

摘要:

大型信息系统的日志数据规模呈现快速增长趋势,导致线速压缩与存储大规模日志数据成为当今数据管理的一大挑战。对大量的网络系统日志进行了研究,发现日志数据存在冗余的结构模式,在内容上存在时间局部相似性。提出了基于模板的细粒度日志差分压缩架构,针对具体日志数据,可配置与其相适应的细粒度差分策略。实验结果表明,与gzip工具相比,所提日志压缩系统在压缩速度上提高了2~10倍,压缩率比gzip更低,可达到10%。

关键词: 日志, 差分压缩, 细粒度, 模板

Abstract:

The scale of log data produced by the large scale information system is growing rapidly.It leads to the big challenge of line-speed compressing and saving the large scale log data.By analysis on massive network log data,it is found that the log data has redundant pattern in terms of log structure and time local similarity in terms of log content.A differential log compression architecture based on template is proposed.Fine-grained differential compressive strategies in the architecture can be configured for a special log data.Experimental results show that,compared with gizp,the proposed log compressing architecture improves 2~10 times’ compressive speed and gain a better compressing ratio approaching to 10%.

Key words: log, differential compression, fine grain, template

No Suggested Reading articles found!