渐进式的协议状态机主动推断方法

doi:10.11959/j.issn.2096-109x.2023023

摘要/Abstract

摘要：

主动协议状态机推断的理论基础为主动自动机学习，所面临的核心问题是字母表的抽象与映射器的构建。同一类型消息取值的多样性可能导致同一类型的数据包存在不同的响应类型，从而导致当前使用类型作为字母表的方法会丢失状态或状态转移。对此，依据不同的响应将协议类型细化为子类型，提出一种渐进式主动推断方法。基于已有协议数据提取协议状态字段，构建初始字母表与映射器，基于主动推断方法得到初始状态机；对数据进行确定性变异，若输入输出类型序列与当前状态机不符，则将变异后数据视为协议子类型，并添加至字母表，同时依据新的字母表进行新的状态机推断。此外，为减少协议实际交互次数，依据协议特性，在主动推断算法的缓存机制基础上提出一种基于前缀匹配的预响应查询算法。实现了开源框架ProLearner，并以SMTP和RTSP为对象，通过扩展协议子类型获得了更为详细的协议行为，验证了所提方法的有效性；此外，实验结果表明预响应查询算法可有效减少实际交互的次数，平均降低的实际交互次数约为10%。

关键词: 协议逆向分析, 主动自动机学习, 协议状态机推断, Mealy自动机, 映射器

Abstract:

Protocol state machine active inference is a technique that relies on active automata learning.However, the abstraction of the alphabet and the construction of the mapper present critical challenges.Due to the diversity of messages of the same type, the response types of the same type are different, causing the method of regarding the message types as the alphabet will result in the loss of states or state transitions.To address the issue, message types were refined into subtypes according to the different responses and a progressive active inference method was proposed.The proposed method extracted the state fields from the existing protocol data to construct the initial alphabet and the mapper, and obtained the initial state machine based on active automata learning.It then mutated the existing messages to explore the response sequences, which were inconsistent with the current state machine.The mutated message was regarded as a protocol subtype and added to the alphabet, and a new state machine was inferred progressively based on the new alphabet.In order to reduce the interactions, a pre-response query algorithm was proposed based on prefix matching for the caching mechanism in the active automata learning.The ProLearner tool was utilized to evaluate the proposed method in the context of the SMTP and RSTP protocols.It is verified that the pre-response query method can effectively reduce the number of actual interactions, with an average reduction rate of about 10%.

Key words: protocol reverse analysis, active automata learning, protocol state machine inference, Mealy automata, mapper

中图分类号:

TP393

潘雁, 林伟, 祝跃飞. 渐进式的协议状态机主动推断方法[J]. 网络与信息安全学报, 2023, 9(2): 81-93.

Yan PAN, Wei LIN, Yuefei ZHU. Progressive active inference method of protocol state machine[J]. Chinese Journal of Network and Information Security, 2023, 9(2): 81-93.

图/表 14

图1

图2

表1

图3

图4

图5

图6

表2

图7

图8

图9

表3

图10

表4

参考文献 29

[1]	WANG Y P , ZHANG Z B , YAO D F ,et al. Inferring protocol state machine from network traces:a probabilistic approach[C]// Proceedings of Applied Cryptography and Network Security. 2011: 1-18.
[2]	HSU Y , SHU G Q , LEE D . A model-based approach to security flaw detection of network protocol implementations[C]// Proceedings of 2008 IEEE International Conference on Network Protocols. 2008: 114-123.
[3]	KRUEGER T , GASCON H , KR?MER N ,et al. Learning stateful models for network honeypots[C]// Proceedings of the 5th ACM Workshop on Security and Artificial Intelligence. 2012: 37-48.
[4]	ANGLUIN D . Learning regular sets from queries and counterexamples[J]. Information and Computation, 1987,75(2): 87-106.
[5]	KLEBER S , MAILE L S , KARGL F . Survey of protocol reverse engineering algorithms:decomposition of tools for static traffic analysis[J]. IEEE Communications Surveys ＆ Tutorials, 2019,21(1): 526-561.
[6]	CASSEL S , HOWAR F , JONSSON B ,et al. A succinct canonical register automaton model[J]. Journal of Logical and Algebraic Methods in Programming, 2015,84(1): 54-66.
[7]	SZéKELY G , LáDI G , HOLCZER T ,et al. Protocol state machine reverse engineering with a teaching-learning approach[J]. Acta Cybernetica, 2021,25(2): 517-535.
[8]	SUN F H , WANG S , ZHANG H L . A progressive learning method on unknown protocol behaviors[J]. Journal of Network and Computer Applications, 2022,197:103249.
[9]	LEITA C , MERMOUD K , DACIER M . ScriptGen:an automated script generation tool for Honeyd[C]// Proceedings of 21st Annual Computer Security Applications Conference (ACSAC＇05). 2005: 203-214.
[10]	COMPARETTI P M , WONDRACEK G , KRUEGEL C ,et al. Prospex:protocol specification extraction[C]// Proceedings of 2009 30th IEEE Symposium on Security and Privacy. 2009: 110-125.
[11]	KRUEGER T , GASCON H , KR?MER N ,et al. Learning stateful models for network honeypots[C]// Proceedings of the 5th ACM Workshop on Security and Artificial Intelligence. 2012: 37-48.
[12]	LEE C , BAE J , LEE H . PRETT:protocol reverse engineering using binary tokens and network traces[C]// Proceedings of ICT Systems Security and Privacy Protection. 2018: 141-155.
[13]	LIN Y D , LAI Y K , BUI Q T ,et al. ReFSM:Reverse engineering from protocol packet traces to test generation by extended finite state machines[J]. Journal of Network and Computer Applications, 2020,171:102819.
[14]	GOLD E M . Language identification in the limit[J]. Information and Control, 1967,10(5): 447-474.
[15]	CHO C Y , ? D B , SHIN E C R ,et al. Inference and analysis of formal models of botnet command and control protocols[C]// Proceedings of the 17th ACM Conference on Computer and Communications Security. 2010: 426-439.
[16]	AARTS F D . Tomte:bridging the gap between active learning and real-world systems[J]. Model Based System Development, 2014.
[17]	RUITER J , POLL E . Protocol state fuzzing of TLS implementations[C]// Proceedings of 24th USENIX Security Symposium. 2015: 193-206.
[18]	申莹珠, 顾纯祥, 陈熹 ,等. 基于模型学习的 OpenVPN 系统脆弱性分析[J]. 软件学报, 2019,30(12): 3750-3764.
	SHEN Y Z , GU C X , CHEN X ,et al. Vulnerability analysis of OpenVPN system based on model learning[J]. Journal of Software, 2019,30(12): 3750-3764.
[19]	FITERAU-BROSTEAN P , JONSSON B , MERGET R ,et al. Analysis of DTLS implementations using protocol state fuzzing[C]// Proceedings of 29th USENIX Security Symposium. 2020: 2523-2540.
[20]	RAFFELT H , STEFFEN B , BERG T . LearnLib:a library for automata learning and experimentation[C]// Proceedings of the 10th International Workshop on Formal Methods for Industrial Critical Systems. 2005: 62-71.
[21]	FITERAU-BROSTEAN P , JANSSEN R , VAANDRAGER F . Combining model learning and model checking to analyze TCP implementations[C]// Proceedings of Computer Aided Verification. 2016: 454-471.
[22]	FITERAU-BROSTEAN P , HOWAR F . Learning-based testing the sliding window behavior of TCP implementations[C]// Proceedings of Critical Systems:Formal Methods and Automated Verification. 2017: 185-200.
[23]	GUO J X , GU C X , CHEN X ,et al. Model learning and model checking of IPSec implementations for internet of things[J]. IEEE Access, 2019,7: 171322-171332.
[24]	王辰, 吴礼发, 洪征 ,等. 一种基于域知识的协议状态机主动推断算法[J]. 计算机科学, 2015,42(12): 233-239.
	WANG C , WU L F , HONG Z ,et al. Domain-specific algorithm of protocol state machine active inference[J]. Computer Science, 2015,42(12): 233-239.
[25]	ISBERNER M . Foundations of active automata learning:an algorithmic perspective[D]. Technische Universit?t Dortmund,Dortmund, 2015.
[26]	CENGIZ TüRKER U , HIERONS R M , JOURDAN G V . Minimizing characterizing sets[J]. Science of Computer Programming, 2021,208:102645.
[27]	HOWAR F , STEFFEN B , JONSSON B ,et al. Inferring canonical register automata[C]// Proceedings of International Workshop on Verification,Model Checking,and Abstract Interpretation. 2012: 251-266.
[28]	FANG D L , SONG Z W , GUAN L ,et al. ICS3Fuzzer:a framework for discovering protocol implementation bugs in ICS supervisory software by fuzzing[C]// Proceedings of ACSAC:Annual Computer Security Applications Conference. 2021: 849-860.
[29]	YANG N , ASLAM K , SCHIFFELERS R ,et al. Improving model inference in industry by combining active and passive learning[C]// Proceedings of 2019 IEEE 26th International Conference on Software Analysis,Evolution and Reengineering. 2019: 253-263.

阶段	字母表	协议消息
	Describe01	DESCRIBE XX aacAudioTest
第一阶段	Setup01	SETUP XX aacAudioTest/track1
	Play01	PLAY XX aacAudioTest
	Teardown01	TEARDOWN XX aacAudioTest
	Describe02	DESCRIBE XX matroskaFileTest
	Setup02	SETUP XX matroskaFileTest/track1
第二阶段	Setup02	SETUP XX matroskaFileTest/track2 session
	Play02	PLAY XX matroskaFileTest
	Teardown02	TEARDOWN XX matroskaFileTest

字母	数量	有效数量
Describe	4	1
Setup	8	4
Play	6	1
Teardown	6	1

字母	数量	有效数量
EHLO	2	1
MAIL	4	2
RCPT	3	1
DATA	1	1
MSG	1	1
RSET	1	1
QUIT	1	1

协议状态机	算法	学习过程所需时间/s	等价查询过程所需时间/s	独特查询数	学习程序与协议实体的实际交互数
Live555（20210824）	原始	1 754	3 128	11 512	10 412
	改进	1 195	3 124	11 512	8 942（14%）
EXIM4.93	原始	1 226	3 433	4 058	3 006
	改进	1 105	3 445	4 058	2 786（7%）