通信学报
• • 上一篇 下一篇
章思宇,姜开达,韦建文,罗 萱,王海洋
出版日期:
发布日期:
基金资助:
Online:
Published:
摘要: 当今网络带宽、设备和应用数量急剧扩张,日志管理面临数据量爆炸式增长的挑战。基于SQL-on-Hadoop构建网络日志分析平台,实现千亿级日志存储和高效、灵活查询。利用真实TB级数据集对多种Hadoop列存储格式及压缩算法进行性能测试,并对比Hive和Impala引擎日志扫描及统计查询效率,选用Gzip压缩的Parquet格式可将日志体积压缩80%,且将Impala查询性能提升至5倍。基于该平台已开发6种安全事件响应、攻击检测和预警应用并发挥良好效果。
Abstract: With the rapid expansion of network bandwidth, devices and applications, log management is facing the challenge of exploding data volumes. Log analysis platform built on SQL-on-Hadoop is capable of storing and querying hundreds of billions of log entries effectively. Columnar and compressed data formats for Hadoop are benchmarked with real-world multi-TB dataset. Conditional and statistical querying efficiency of Hive and Impala is tested. With gzipped parquet format, log data can be compressed by 80%, and querying with impala is 5 times faster. On this platform, six security incident analysis and detection applications are already deployed.
章思宇,姜开达,韦建文,罗 萱,王海洋. 基于SQL-on-Hadoop的网络日志分析[J]. 通信学报.
0 / / 推荐
导出引用管理器 EndNote|Reference Manager|ProCite|BibTeX|RefWorks
链接本文: https://www.infocomm-journal.com/txxb/CN/
https://www.infocomm-journal.com/txxb/CN/Y2014/V35/IZ1/4