Journal on Communications ›› 2018, Vol. 39 ›› Issue (10): 11-21.doi: 10.11959/j.issn.1000-436x.2018222

• Papers • Previous Articles     Next Articles

Stochastic algorithm for HDFS data theft detection based on MapReduce

Yuanzhao GAO1,2,Binglong LI1,Xingyuan CHEN1,2   

  1. 1 Third Academy,Information Engineering University,Zhengzhou 450001,China
    2 State Key Laboratory of Cryptology,Beijing 100094,China
  • Revised:2018-07-03 Online:2018-10-01 Published:2018-11-23
  • Supported by:
    The National High Technology Research and Development Program of China (863 Program)(2015AA016006);The National Natural Science Foundation of China(61702550)

Abstract:

To address the problems of big data efficient analysis and insider theft detection in the data theft detection of distributed cloud computing storage,taking HDFS (hadoop distributed file system) as a case study,a stochastic algorithm for HDFS data theft detection based on MapReduce was proposed.By analyzing the MAC timestamp features of HDFS generated by folder replication,the replication behavior’s detection and measurement method was established to detect all data theft modes including insider theft.The data set which is suitable for MapReduce task partition and maintains the HDFS hierarchy was designed to achieve efficient analysis of large-volume timestamps.The experimental results show that the missed rate and the number of mislabeled folders could be kept at a low level by adopting segment detection strategy.The algorithm was proved to be efficient and had good scalability under the MapReduce framework.

Key words: stochastic detection algorithm, HDFS, MapReduce, MAC timestamp, cloud computing storage

CLC Number: 

No Suggested Reading articles found!