通信学报 ›› 2013, Vol. 34 ›› Issue (10): 183-190.doi: 10.3969/j.issn.1000-436x.2013.10.021

• 学术通信 • 上一篇    

基于状态约束的大规模正则表达式匹配算法

贺炜,郭云飞,扈红超   

  1. 国家数字交换系统工程技术研究中心,河南 郑州450002
  • 出版日期:2013-10-25 发布日期:2017-08-10
  • 基金资助:
    国家科技支撑计划基金资助项目;国家科技支撑计划基金资助项目;国家高技术研究发展计划(“863”计划)基金资助项目;国家高技术研究发展计划(“863”计划)基金资助项目;国家高技术研究发展计划(“863”计划)基金资助项目

States constrain-based algorithm for large scale regular expression matching

Wei HE,Yun-fei GUO,Hong-chao HU   

  1. National Digital Switching System Engineering Technological R&D Center,Zhengzhou 450002,China
  • Online:2013-10-25 Published:2017-08-10
  • Supported by:
    The National Science & Technology Pillar Program;The National Science & Technology Pillar Program;The National High Technology Research and Development Program of China (863 Program);The National High Technology Research and Development Program of China (863 Program);The National High Technology Research and Development Program of China (863 Program)

摘要:

通过观察不确定有限自动机NFA到确定性有限自动机DFA的转化过程,分析内存增长的原因,提出了一种基于状态间约束关系的正则表达式匹配算法 Group2-DFA。Group2-DFA 通过两级分组,利用状态间的约束关系,将原始NFA转化为NFA和DFA的混合结构。实验表明,在保持一定处理速率的前提下,Group2-DFA能够有效地减少内存占用。在300条规则下,Group2-DFA吞吐率能够达到1Gbps,并且减少约75%的状态数。

关键词: 正则表达式, 状态爆炸, 自动机转化, 状态约束

Abstract:

By analysis of state explosion in deterministic finite automata DFA,a novel algorithm Group2-DFA based on state constrains was proposed to reduce the memory usage.With the state constrains,states in NFA were classified into several groups.Group2-DFA introduces two-level classification and merges NFA and DFA together to a hybrid FA construction.The experiments show that Group2-DFA can reduce memory usage efficiently and keep high throughput with a small increase of memory reading time.With 300 regex rules,Group2-DFA can cut 75% states and achieve 1Gbps throughput.

Key words: regular expression, state explosion, automata convert, state constrains

No Suggested Reading articles found!