基于最大似然概率的协议关键词长度确定方法

doi:10.11959/j.issn.1000-436x.2016121

通信学报 ›› 2016, Vol. 37 ›› Issue (6): 119-128.doi: 10.11959/j.issn.1000-436x.2016121

基于最大似然概率的协议关键词长度确定方法

罗建桢¹,余顺争²,蔡君¹

¹ 广东技术师范学院电子与信息学院，广东广州 510665
² 中山大学电子与信息工程系，广东广州 510006

出版日期:2016-06-25 发布日期:2017-08-04
基金资助:
国家自然科学基金资助项目;国家自然科学基金资助项目;广东省自然科学基金资助项目;广东省自然科学基金资助项目;广东省教育厅特色创新项目（自然科学）基金资助项目;广东省高校优秀青年教师基金资助资助项目;广东省应用型科技研发专项基金资助项目;广东省科技计划基金资助项目;广东省教育厅省级重大基金资助项目;广东省普通高校国际合作重大基金资助项目;广东省公益研究与能力建设专项基金资助项目

Method for determining the lengths of protocol keywords based on maximum likelihood probability

Jian-zhen LUO¹,Shun-zheng YU²,Jun CAI¹

¹ School of Electronic and Information,Guangdong Polytechnic Normal University,Guangzhou 510665,China
² School of Information Science and Technology,Sun Yat-Sen University,Guangzhou 510006,China

Online:2016-06-25 Published:2017-08-04
Supported by:
The National Natural Science Foundation of China;The National Natural Science Foundation of China;The Natural Science Foundation of Guangdong Province;The Natural Science Foundation of Guangdong Province;Guangdong Provincial Department of Education Innovation Project;The Excellent Young Teachers in Universities in Guangdong Province;Guangdong Provincial Application-Oriented Technical Research and Development Special;Science and Technology Planning Project of Guangdong Province;Science and Technology Major Project of Education Department of Guangdong Province;International Scientific and Technological Cooperation Projects of Education Department of Guangdong Province;Science and Technology Project of Guangdong Province

摘要/Abstract

摘要：

提出非齐次左—右型级联隐马尔可夫模型，用于应用层网络协议报文建模，描述状态之间的转移规律和各状态的内部相位变化规律，刻画报文的字段跳转规律和字段内的马尔可夫性质，基于最大似然概率准则确定协议关键词的长度，推断协议关键词，自动重构协议的报文格式。实验结果表明，所提出方法能有效地识别出协议关键词和重构协议报文格式。

关键词: 隐马尔可夫模型, 协议逆向工程, 网络安全, 报文格式

Abstract:

A left-to-right inhomogeneous cascaded hidden Markov modelwas proposed and applied to model application protocol messages.The proposed modeldescribed the transition probabilities between states and the evolution rule of phases inside the states,revealed the transition feature ofmessage fields and the left-to-right Markov characteristicsinside the fields.The protocol keywords were inferred by selecting lengths with maximum likelihood probability,and then the message format was recovered.The experimental results demonstrated that the proposed method perform well in protocol keyword extraction and message format recovery.

Key words: hidden Markov model, protocol reverse engineering, network security, message format

罗建桢,余顺争,蔡君. 基于最大似然概率的协议关键词长度确定方法[J]. 通信学报, 2016, 37(6): 119-128.

Jian-zhen LUO,Shun-zheng YU,Jun CAI. Method for determining the lengths of protocol keywords based on maximum likelihood probability[J]. Journal on Communications, 2016, 37(6): 119-128.

图/表 10

图1

图2

表1

表2

表3

表4

表5

表6

图3

图4

参考文献 48

[1]	赵咏, 姚秋林, 张志斌 ,等. TPCAD:一种文本类多协议特征自动发现方法[J]. 通信学报, 2009,30(10A): 28-35. ZHAO Y , YAO Q L , ZHANG Z B ,et al. TPCAD:a text-oriented multi-protocol inference approach[J]. Journal on Communications, 2009,30(10A): 28-35.
[2]	张树壮, 罗浩, 方滨兴 . 面向网络安全的正则表达式匹配技术[J]. 软件学报, 2011,22(8): 1838-1854. ZHANG S Z , LUO H , FANG B X . Regular expressions matching for network security[J]. Journal of Software, 2011,22(8): 1838-1854.
[3]	CABALLERO J , SONG D . Automatic protocol reverse-engineering:message format extraction and field semantics inference[J]. Computer Networks, 2013,57(2): 451-474.
[4]	TRIDGELL A . How samba was written[EB/OL]. .
[5]	Pidgin[EB/OL]. . 2014.
[6]	Rdesktop:a remote desktop protocol client[EB/OL]. . 2014.
[7]	KIM H , CHOI Y , LEE D . Efficient file fuzz testing using automated analysis of binary file format[J]. Journal of Systems Architecture, 2011,57: 259-268.
[8]	李伟明, 张爱芳, 刘建财 ,等. 网络协议的自动化模糊测试漏洞挖掘方[J]. 计算机学报, 2011,34(2): 242-255. LI W M , ZHANG A F , LIU J C ,et al. An automatic network protocol fuzz testing and vulnerability discovering method[J]. Chinese Journal of Computers, 2011,34(2): 242-255.
[9]	IETF[EB/OL]. . 2014.
[10]	Internet2 netflow statistic[EB/OL]. , 2012.
[11]	WEI X , GOMEZ L , NEAMTIU I ,et al. ProfileDroid:multi-layer profiling of android applications[C]// 18th Annual International Conference on Mobile Computing and Networking. ACM, 2012: 137-148.
[12]	DAI S , TONGAONKAR A , WANG X ,et al. Networkprofiler:towards automatic fingerprinting of android apps[C]// 2013 Proceedings IEEE,INFOCOM. 2013. 809-817.
[13]	LEE S W , PARK J S , LEE H S ,et al. A study on smart-phone traffic analysis[C]// IEEE Network Operations and Management Symposium (APNOMS), 2011: 1-7.
[14]	FALAKI H , LYMBEROPOULOS D , MAHAJAN R ,et al. A first look at traffic on smartphones[C]// 10th ACM SIGCOMM Conference on Internet Measurement. ACM, 2010: 281-287.
[15]	NARAYAN J , SHUKLA S K , CLANCY T C . A survey of automatic protocol reverse engineering tools[J]. ACM Computing Surveys, 2016,48(3): 1-26.
[16]	BEDDOE M A . Network protocol analysis using bioinformatics algorithms[EB/OL]. , 2004.
[17]	CUI W , KANNAN J , WANG H . Discoverer:automatic protocol reverse engineering from network traces[C]// 16th USENIX Security Symposium on USENIX Security Symposium. Berkeley,CA,USA:USENIX Association, 2007: 1-14.
[18]	WANG Y , YUN X , SHAFIQ M . A semantics aware approach to automated reverse engineering unknown protocols[C]// 20th IEEE International Conference on Network Protocols(ICNP). 2012: 1-10.
[19]	ZHOU Z , ZHANG Z , LEE P . Toward unsupervised protocol feature word extraction[J]. IEEE Journal on Selected Areas in Communications, 2014,32(10): 1894-1906.
[20]	ZHANG Z , ZHANG Z B , LEE P P ,et al. ProWord:an unsupervised approach to protocol feature word extraction[C]// 2014 Proceedings IEEE INFOCOM. 2014: 1393-1401.
[21]	HE L , WEN Q , ZHANG Z . A TLV Structure semantic constraints based method for reverse engineering protocol packet formats[J]. Journal of Networking Technology, 2014,5(1): 9.
[22]	LI T , LIU Y , ZHANG C . A noise-tolerant system for protocol formats extraction from binary data[C]// 2014 IEEE Workshop on Advanced Research and Technology in Industry Applications (WARTIA). 2014: 862-865.
[23]	TAO S , YU H , LI Q . Bit-oriented format extraction approach for automatic binary protocol reverse engineering[J]. IET Communications, 2016,10(6): 709-716.
[24]	MENG F , LIU Y , ZHANG C . State reverse method for unknown binary protocol based on state-related fields[J]. Telecommunication Engineering, 2015,55(4): 372-378.
[25]	MENG F , LIU Y , ZHANG C . Inferring protocol state machine for binary communication protocol[C]// 2014 IEEE Workshop on in Advanced Research and Technology in Industry Applications (WARTIA). 2014: 870-874.
[26]	GASCON H , WRESSNEGGER C , YAMAGUCHI F . Pulsar:stateful black-box fuzzing of proprietary network protocols security and privacy in communication networks[M]. Springer International Publishing, 2015: 330-347.
[27]	肖明明, 余顺争 . 基于文法推断的协议逆向工程[J]. 计算机研究与发展, 2013,50(10): 2044-2058. XIAO M M , YU S Z . Protocol reverse engineering using grammatical inference[J]. Journal of Computer Research ＆Development, 2013,50(10): 2044-2058.
[28]	游翔, 葛卫丽 . 飞信协议识别与多元通联关系提取方法[J]. 现代电子技术, 2014(21): 19-23. YOU X , GE W L . Protocol identification and multi?conversation relationship extraction in Fetion[J]. Modern Electronics Technique, 2014(21): 19-23.
[29]	岳旸, 孟凡治, 张春瑞 ,等. 面向二进制数据帧的聚类系统[J]. 计算机应用研究, 2015(3): 909-916. YUE Y , MENG F Z , ZHANG C R ,et al. Cluster system for binary data frame[J]. Application Research of Computers, 2015(3): 909-916.
[30]	琚玉建, 谢绍斌, 张薇 . 网络协议帧切分优化过程研究与仿真[J]. 计算机仿真, 2015(1): 318-321. JU Y J , XIE S B , ZHANG W . Research and simulation of optimization process for network protocol frame segmentation[J]. Computer Simulation, 2015(1): 318-321.
[31]	LI T , LIU Y , ZHANG C . A novel method for delimiting frames of unknown protocol[C]// 2014 IEEE Workshop on Electronics,Computer and Applications. 2014: 552-555.
[32]	CABALLERO J , YIN H , LIANG Z . Polyglot:automatic extraction of protocol message format using dynamic binary analysis[C]// 14th ACM Conference on Computer and Communications Security. New York,NY,USA,ACM, 2007: 317-329.
[33]	CABALLERO J , POOSANKAM P , KREIBICH C . Dispatcher:enabling active botnet infiltration using automatic protocol reverse-engineering[C]// 16th ACM Conference on Computer and Communications Security. New York,NY,USA,ACM, 2009: 621-634.
[34]	CABALLERO J , SONG D . Automatic protocol reverse-engineering:Message format extraction and field semantics inference[J]. Computer Networks, 2013,57(2): 451-474.
[35]	ZHAO L , REN X , LIU M . Collaborative reversing of input formats and program data structures for security applications[J]. China Communications, 2014,11(9): 135-147.
[36]	LIN Z , ZHANG X , XU D . Reverse engineering input syntactic structure from program execution and its applications[J]. IEEE Transactions on Software Engineering, 2010,36(5): 688-703.
[37]	CUI B , WANG F , HAO Y . A taint based approach for automatic reverse engineering of gray-box file formats[J]. Soft Computing, 2015: 1-16.
[38]	WANG Z , JIANG X , CUI W . ReFormat:automatic reverse engineering of encrypted messages[M]. Berlin: Springer, 2009.
[39]	ZHAO R , GU D , LI J . Automatic detection and analysis of encrypted messages in malware[J]. Information Security and Cryptology, 2014,8567: 101-117.
[40]	LIN W , FEI J , ZHU Y . A method of multiple encryption and sectional encryption protocol reverse engineering[C]// 2014 Tenth International Conference on Computational Intelligence and Security(CIS). 2014: 420-424.
[41]	LI M , WANG Y , HUANG Z . Reverse analysis of secure communication protocol based on taint analysis[C]// 2014 Communications Security Conference, 2014: 1-8.
[42]	石小龙, 祝跃飞, 刘龙 ,等. 加密通信协议的一种逆向分析方法[J]. 计算机应用研究, 2015(1): 214-221. SHI X L , ZHU Y F , LIU L ,et al. Method of encrypted protocol reverse engineering[J]. Application Research of Computers, 2015(01): 214-221.
[43]	JELINEK F . Continuous speech recognition by statistical methods[J]. Proceedings of the IEEE, 1976,64: 532-556.
[44]	BAKIS R . Continuous speech recognition via centisecond acoustic states[J]. The Journal of the Acoustical Society of America, 1976,59(S1): 97.
[45]	LUO J Z , YU S Z . Position-based automatic reverse engineering of network protocols[J]. Journal of Network and Computer Applications, 2013,36(3): 1070-1077.
[46]	YU S Z . Hidden semi-Markov models[J]. Artificial Intelligence, 2010,174(2): 215-243.
[47]	RABINER L . A tutorial on hidden Markov models and selected applications in speech recognition[J]. Proceedings of the IEEE, 1989,77(2): 257-286.
[48]	YU S Z , KOBAYASHI H . An efficient forward-backward algorithm for an explicit-duration hidden Markov model[J]. IEEE Signal Processing Letters, 2003,10(1): 11-14.

协议名称	连接个数	数据分组数目	数据规模
HTTP	244 443	4.1×10⁶	9 870.3
FTP	27 292	225 823	19.1
SMTP	18 091	111 736	57.2
POP	5 442	423 885	86.6
SSDP	15 184	315 929	78.9
BitTorrent	389 105	1.7×10⁶	512.4

字段序号	字段	属性
F(1)	K(“GET/”)	协议关键词
F(2)	VD	可变字段
F(3)	K(“HTTP/1.1”)	协议关键词
F(4)	K(“Host：”)	协议关键词
F(5)	VD	可变字段
F(6)	K(“User-Agent:”）	协议关键词
F(7)	VD	可变字段
F(8)	K(“Accept：”)	协议关键词
F(9)	VD	可变字段
F(10)	K(“Content:”)	协议关键词
F(11)	VD	可变字段
F(12)	K(“Connection:”)	协议关键词
F(13)	VD	可变字段
F(14)	K(“Referer：”)	协议关键词
F(15)	VD	可变字段
F(16)	K(“Cookie：”)	协议关键词
F(17)	VD	可变字段
M	M	M

序号	Token
1	c(t，“GET”)
2	v(t)
3	c(t,“rep...”)
4	v(t)
5	c(t,“int...”)
6	v(t)
7	c(t,“HTTP/1.1”)
8	c(t,“Host:”)
9	v(t)
10	c(t,“.com”)
11	v(t)
12	c(t,“User...”)
13	v(t)
14	c(t,“ocspd”)
15	v(t)
16	c(t,“(unknown”)
17	v(t)
18	c(t,“version)”)
19	v(t)
20	c(t,“CFNetwork”)
21	v(t)
22	c(t,“Darwin”)
23	v(t)
24	c(t,“(x86 64)”)
25	v(t)
26	c(t,“Conne...”)
27	v(t)
28	M

字节序号	字节内容	ASCII值	字节属性
1	0x47	‘G’	常量
2	0x45	‘E’	常量
3	0x54	‘T’	常量
4	0x20	‘’	空格
5	0x20	‘’	空格
6	0x20	‘’	空格
7	0x20	‘’	空格
8	0x20	‘’	空格

系统	HTTP	FTP	SMTP	POP	SSDP	BitTorrent
LRIHMM	76.0	97.0	70.0	95.8	81.4	66.7
Discoverer	7.2	23.3	19.2	22.8	33.9	5.3
PI	100	100	20.0	16.7	35.6	33.3

基于最大似然概率的协议关键词长度确定方法

Method for determining the lengths of protocol keywords based on maximum likelihood probability

在线阅读

PDF下载

可视化

被引次数

摘要/Abstract

引用本文

使用本文

图/表 10

参考文献 48

相关文章 15

Metrics

推荐阅读 0

[1]	赵仕祺, 黄小红, 钟志港. 基于信誉的域间路由选择机制的研究与实现[J]. 通信学报, 2023, 44(6): 47-56.
[2]	谢人超, 文雯, 唐琴琴, 刘云龙, 谢高畅, 黄韬. 轨道交通移动边缘计算网络安全综述[J]. 通信学报, 2023, 44(4): 201-215.
[3]	徐明, 张保俊, 伍益明, 应晨铎, 郑宁. 面向网络攻击和隐私保护的多智能体系统分布式共识算法[J]. 通信学报, 2023, 44(3): 117-127.
[4]	康海燕, 龙墨澜. 基于吸收马尔可夫链攻击图的网络攻击分析方法研究[J]. 通信学报, 2023, 44(2): 122-135.
[5]	郭渊博, 李勇飞, 陈庆礼, 方晨, 胡阳阳. 融合Focal Loss的网络威胁情报实体抽取[J]. 通信学报, 2022, 43(7): 85-92.
[6]	钱榕, 许建婷, 张克君, 董宏宇, 邢方远. 隐马尔可夫模型的异质网络链接预测方法研究[J]. 通信学报, 2022, 43(5): 214-225.
[7]	张红斌, 尹彦, 赵冬梅, 刘滨. 基于威胁情报的网络安全态势感知模型[J]. 通信学报, 2021, 42(6): 182-194.
[8]	张腾飞, 余顺争. 移动设备加密流量的用户信息探测研究展望[J]. 通信学报, 2021, 42(2): 154-167.
[9]	程旭, 王莹莹, 张年杰, 付章杰, 陈北京, 赵国英. 基于空间感知的多级损失目标跟踪对抗攻击方法[J]. 通信学报, 2021, 42(11): 242-254.
[10]	黄韬, 刘江, 汪硕, 张晨, 刘韵洁. 未来网络技术与发展趋势综述[J]. 通信学报, 2021, 42(1): 130-150.
[11]	罗智勇,杨旭,刘嘉辉,许瑞. 基于贝叶斯攻击图的网络入侵意图分析模型[J]. 通信学报, 2020, 41(9): 160-169.
[12]	吴武飞,李仁发,曾刚,谢勇,谢国琪. 智能网联车网络安全研究综述[J]. 通信学报, 2020, 41(6): 161-174.
[13]	龙华,杨明亮,邵玉斌. 基于特征流融合的带噪语音检测算法[J]. 通信学报, 2020, 41(4): 134-142.
[14]	李涛,郭渊博,琚安康. 融合对抗主动学习的网络安全知识三元组抽取[J]. 通信学报, 2020, 41(10): 80-91.
[15]	周翰逊,陈晨,冯润泽,熊俊坤,潘宏,郭薇. 基于值导数GRU的移动恶意软件流量检测方法[J]. 通信学报, 2020, 41(1): 102-113.

系统	HTTP	FTP	SMTP	POP	SSDP	BitTorrent
LRIHMM	87.0	92.9	85.7	84.0	74.1	100
Discoverer	78.3	60.7	64.3	40.0	33.3	100
PI	4.4	3.6	7.1	4.0	18.5	50.0