通信学报 ›› 2018, Vol. 39 ›› Issue (10): 11-21.doi: 10.11959/j.issn.1000-436x.2018222

• 学术论文 • 上一篇    下一篇

基于MapReduce的HDFS数据窃取随机检测算法

高元照1,2,李炳龙1,陈性元1,2   

  1. 1 信息工程大学三院,河南 郑州 450001
    2 密码科学技术国家重点实验室,北京 100094
  • 修回日期:2018-07-03 出版日期:2018-10-01 发布日期:2018-11-23
  • 作者简介:高元照(1992-),男,河北衡水人,信息工程大学博士生,主要研究方向为云计算取证、大数据安全。|李炳龙(1974-),男,河南卫辉人,博士,信息工程大学副教授、硕士生导师,主要研究方向为数字取证。|陈性元(1963-),男,安徽无为人,博士,信息工程大学教授、博士生导师,主要研究方向为网络与信息安全。
  • 基金资助:
    国家高科技研究发展计划(“863”计划)基金资助项目(2015AA016006);国家自然科学基金资助项目(61702550)

Stochastic algorithm for HDFS data theft detection based on MapReduce

Yuanzhao GAO1,2,Binglong LI1,Xingyuan CHEN1,2   

  1. 1 Third Academy,Information Engineering University,Zhengzhou 450001,China
    2 State Key Laboratory of Cryptology,Beijing 100094,China
  • Revised:2018-07-03 Online:2018-10-01 Published:2018-11-23
  • Supported by:
    The National High Technology Research and Development Program of China (863 Program)(2015AA016006);The National Natural Science Foundation of China(61702550)

摘要:

为了解决分布式云计算存储的数据窃取检测中,出现数据量大、内部窃取难以检测的问题,以 hadoop分布式文件系统(HDFS,hadoop distributed file system)为检测对象,提出了一种基于MapReduce的数据窃取随机检测算法。分析HDFS文件夹复制产生的MAC时间戳特性,确立复制行为的检测与度量方法,确保能够检测包括内部窃取的所有窃取模式。设计适合于 MapReduce 任意的任务划分,同时记录 HDFS 层次关系的输入数据集,实现海量时间戳数据的高效分析。实验结果表明,该算法能够通过分段检测策略很好地控制漏检率和误检文件夹数量,并且具有较高的执行效率和良好的可扩展性。

关键词: 随机检测算法, HDFS, MapReduce, MAC时间戳, 云计算存储

Abstract:

To address the problems of big data efficient analysis and insider theft detection in the data theft detection of distributed cloud computing storage,taking HDFS (hadoop distributed file system) as a case study,a stochastic algorithm for HDFS data theft detection based on MapReduce was proposed.By analyzing the MAC timestamp features of HDFS generated by folder replication,the replication behavior’s detection and measurement method was established to detect all data theft modes including insider theft.The data set which is suitable for MapReduce task partition and maintains the HDFS hierarchy was designed to achieve efficient analysis of large-volume timestamps.The experimental results show that the missed rate and the number of mislabeled folders could be kept at a low level by adopting segment detection strategy.The algorithm was proved to be efficient and had good scalability under the MapReduce framework.

Key words: stochastic detection algorithm, HDFS, MapReduce, MAC timestamp, cloud computing storage

中图分类号: 

No Suggested Reading articles found!