通信学报 ›› 2015, Vol. 36 ›› Issue (7): 31-39.doi: 10.11959/j.issn.1000-436x.2015148

• 学术论文 • 上一篇    下一篇

HybridFA:一种基于统计的AC自动机空间优化技术

熊刚1,何慧敏2,于静1,刘燕兵1,郭莉1   

  1. 1 中国科学院 信息工程研究所,北京 100093
    2 中国移动(深圳)有限公司,深圳 518031
  • 出版日期:2015-07-25 发布日期:2015-07-25
  • 基金资助:
    中国科学院战略性科技先导专项基金资助项目;国家高技术研究发展计划(“863计划)基金资助项目;国家自然科学基金青年基金资助项目

HybridFA:a memory reduction technique for the AC automata based on statistics

Gang XIONG1,Hui-min HE2,Jing YU1,Yan-bing LIU1,Li GUO1   

  1. 1 Institute of Information Engineering,Chinese Academy of Science,Beijing 100093,China
    2 China Mobile Group,Guangdong Co.Ltd.,Shenzhen Branch,Shenzhen 518031,China
  • Online:2015-07-25 Published:2015-07-25
  • Supported by:
    The Strategic Priority Research Program of the Chinese Academy of Sciences;The National High Technology Research and Development Program of China (863 Program);The National Natural Science Foundation of China

摘要:

针对高级Aho-Corasick (AC)自动机为提高串匹配速度而造成的空间浪费问题,研究发现数据流对自动机节点的访问规律,据此提出基于数据访问特征的混合自动机构建算法HybridFA。分别研究了基于访问频率、访问层次以及结合上述2种特征对AC自动机的部分节点实现完全化的算法。在Snort、ClamAV、URL等真实数据集上的实验结果表明,HybridFA算法的存储空间低于高级AC自动机的5%。此外,结合访问频率和访问层次的改进算法在保证匹配速度的同时具有更强的数据适应性。

关键词: 多模式串匹配, 空间优化, 高级AC自动机, 统计策略, 节点完全化

Abstract:

Despite the fast speed in multiple string matching tasks,the advanced Aho-Corasick(AC) automata wastes storage memory to a great extent.Study indicated that the automata states have specific statistical access characteristics in practice.Accordingly,a series of algorithms based on statistical characteristics for building hybrid finite automata,named HybridFA,are proposed.This work completes partial states of the AC automata according to different features,including access frequency,state hierarchy,and combined characteristics respectively.Experimental results on the real-world datasets like Snort,ClamAV,and URL show that the storage space of HybridAC is reduced to less than 5% of the space cost by the advanced AC automata.Furthermore,HybridFA based on combined characteristics achieves the superior performance on matching speed and robustness comparing to other proposed algorithms.

Key words: multiple string matching, memory reduction, advanced AC automata, statistical strategy, state completing

No Suggested Reading articles found!