通信学报 ›› 2015, Vol. 36 ›› Issue (10): 172-180.doi: 10.11959/j.issn.1000-436x.2015215

• 学术论文 • 上一篇    下一篇

HashTrie:一种空间高效的多模式串匹配算法

张萍1,2,3,刘燕兵1,3,于静1,3,谭建龙1,3   

  1. 1 中国科学院 信息工程研究所,北京 100093
    2 中国科学院大学,北京 100049
    3 信息内容安全技术国家工程实验室,北京 100093
  • 出版日期:2015-10-25 发布日期:2015-10-27
  • 基金资助:
    国家自然科学基金青年基金资助项目;国家高技术研究发展计划(“863”计划)基金资助项目;中国科学院战略性科技先导专项基金资助项目

HashTrie:a space-efficient multiple string matching algorithm

Ping ZHANG1,2,3,Yan-bing LIU1,3,Jing YU1,3,Jian-long TAN1,3   

  1. 1 Institute of Information Engineering,Chinese Academy of Sciences,Beijing 100093,China
    2 University of Chinese Academy of Sciences,Beijing 100049,China
    3 National Engineering Laboratory for Information Security Technologies,Beijing 100093,China
  • Online:2015-10-25 Published:2015-10-27
  • Supported by:
    The National Natural Science Foundation of China;The National High Technology Research and Development of China(863 Program);The Strategic Priority Research Program of the Chinese Academy of Sci-ences

摘要:

经典的多模式串匹配算法AC的内存开销巨大,已经无法满足当前高速网络环境下大规模特征串实时匹配的应用需求。针对这一问题,提出一种空间高效的多模式串匹配算法—HashTrie。该算法运用递归散列函数,将模式串集合的信息存储在位向量中,以取代状态转移表来减少空间消耗,并利用Rank操作进行快速匹配校验。理论分析表明,HashTrie算法的空间复杂度为O(|P|),与模式串集合的规模|P|线性相关,与字符集大小σ无关,优于经典多模式串匹配算法AC的空间复杂度O(|P|σlog|P|)。在随机数据集和真实数据集(Snort、ClamAV和URL)上的测试结果表明,HashTrie算法比AC算法节约高达99.6%的存储空间,匹配速度约为AC算法的一半左右。HashTrie算法适合于模式串集合规模较大、模式串长度较短的多模式串匹配问题,是一种空间高效的多模式串匹配算法。

关键词: 入侵检测, 多模式串匹配, 位向量, 递归散列函数, 空间高效

Abstract:

The famous multiple string matching algorithm AC consumed huge memory when the string signatures were massive,thus unable to process high speed network traffic efficiently.To solve this problem,a space-efficient multiple string matching algorithm-HashTrie was proposed.This algorithm adopted recursive hash function to store the patterns in bit-vectors in place of the state transition table in order to reduce space consumption.Further more it made use of the rank operation for fast verification.Theoretic analysis shows that the space complexity of HashTrie is O(|P|),which is linear with the size of pattern set |P|and is independent of the alphabetsize σ.The space complexity is superior to the complexity O(|P|σlog|P|)of AC.Experiments on synthetic datasets and real-world datasets(such as Snort,ClamAV and URL)show that HashTrie saves up to 99.6% storage cost compared with AC,and in the meanwhile it runs at a matching speed that is about half of AC.HashTrie is a space-efficient multiple string matching algorithm that is appropriate to search large scale pattern strings with short lengths.

Key words: intrusion detection, multiple string matching, bit-vector; recursive hash function, space-efficient

No Suggested Reading articles found!