通信学报 ›› 2014, Vol. 35 ›› Issue (Z1): 14-19.doi: 10.3969/j.issn.1000-436x.2014.z1.004

• 网络空间完全 • 上一篇    下一篇

基于SQL-on-Hadoop的网络日志分析

章思宇1,姜开达1,韦建文1,罗萱1,王海洋2   

  1. 1 上海交通大学 网络信息中心,上海 200240
    2 上海交通大学 电子信息与电气工程学院,上海 200240
  • 出版日期:2014-10-25 发布日期:2017-06-19
  • 基金资助:
    国家自然科学基金资助项目

Network log analysis with SQL-on-Hadoop

Si-yu ZHANG1,Kai-da JIANG1,Jian-wen WEI1,Xuan LUO1,Hai-yang WANG2   

  1. 1 Network and Information Center,Shanghai Jiaotong University,Shanghai 200240,China
    2 School of Electronic Information and Electrical Engineering,Shanghai Jiaotong University,Shanghai 200240,China
  • Online:2014-10-25 Published:2017-06-19
  • Supported by:
    The National Natural Science Foundation of China

摘要:

摘 要:当今网络带宽、设备和应用数量急剧扩张,日志管理面临数据量爆炸式增长的挑战。基于SQL-on-Hadoop构建网络日志分析平台,实现千亿级日志存储和高效、灵活查询。利用真实TB 级数据集对多种 Hadoop 列存储格式及压缩算法进行性能测试,并对比Hive和Impala引擎日志扫描及统计查询效率,选用Gzip压缩的Parquet格式可将日志体积压缩80%,且将Impala查询性能提升至5倍。基于该平台已开发6种安全事件响应、攻击检测和预警应用并发挥良好效果。

关键词: 日志分析, 大数据, Hadoop, SQL, 网络安全

Abstract:

With the rapid expansion of network bandwidth,devices and applications,log management is facing the challenge of exploding data volumes.Log analysis platform built on SQL-on-Hadoop is capable of storing and querying hundreds of billions of log entries effectively.Columnar and compressed data formats for Hadoop are benchmarked with real-world multi-TB dataset.Conditional and statistical querying efficiency of Hive and Impala is tested.With gzipped parquet format,log data can be compressed by 80%,and querying with impala is 5 times faster.On this platform,six security incident analysis and detection applications are already deployed.

Key words: og analysis, big data, Hadoop, SQL, network security

No Suggested Reading articles found!