上下文感知的安卓应用程序漏洞检测研究

doi:10.11959/j.issn.1000-436x.2021198

通信学报 ›› 2021, Vol. 42 ›› Issue (11): 13-27.doi: 10.11959/j.issn.1000-436x.2021198

• 专题：计算机通信与网络系统安全技术 • 上一篇下一篇

上下文感知的安卓应用程序漏洞检测研究

秦佳伟¹^,², 张华¹, 严寒冰², 何能强², 涂腾飞¹

¹ 北京邮电大学网络与交换技术国家重点实验室，北京 100876
² 国家计算机网络应急技术处理协调中心，北京 100029

修回日期:2021-09-27 出版日期:2021-11-25 发布日期:2021-11-01
作者简介:秦佳伟（1993− ），男，满族，辽宁本溪人，北京邮电大学博士生，国家计算机网络应急技术处理协调中心工程师，主要研究方向为移动端安全分析、物联网安全分析等
张华（1978− ），女，吉林四平人，博士，北京邮电大学副教授，主要研究方向为网络安全、隐私保护等
严寒冰（1975− ），男，江西进贤人，博士，国家计算机网络应急技术处理协调中心教授级工程师，主要研究方向为网络安全、计算机图形学等
何能强（1985− ），男，浙江义乌人，博士，国家计算机网络应急技术处理协调中心高级工程师，主要研究方向为移动恶意程序分析、应用程序安全检测等
涂腾飞（1990− ），男，山东临沂人，博士，北京邮电大学在站博士后，主要研究方向为网络安全、移动安全等
基金资助:
国家自然科学基金资助项目(62072051);国家自然科学基金资助项目(61976024);国家自然科学基金资助项目(61972048);中央高校基本科研业务费专项资金资助项目(2019XD-A01);教育部区块链核心计划基金资助项目(2020KJ010802)

Research on context-aware Android application vulnerability detection

Jiawei QIN¹^,², Hua ZHANG¹, Hanbing YAN², Nengqiang HE², Tengfei TU¹

¹ State Key Laboratory of Networking and Switching Technology, Beijing University of Posts and Telecommunications, Beijing 100876, China
² The National Computer Network Emergency Response Technical Team/Coordination Center of China, Beijing 100029, China

Revised:2021-09-27 Online:2021-11-25 Published:2021-11-01
Supported by:
The National Natural Science Foundation of China(62072051);The National Natural Science Foundation of China(61976024);The National Natural Science Foundation of China(61972048);The Fundamental Research Funds for the Central Universities(2019XD-A01);Key Project Plan of Blockchain in Ministry of Education(2020KJ010802)

摘要/Abstract

摘要：

针对基于学习的安卓应用程序的漏洞检测模型对源程序的特征提取结果欠缺语义信息，且提取的特征化结果包含与漏洞信息无关的噪声数据，导致漏洞检测模型的准确率下降的问题，提出了一种基于代码切片（CIS）的程序特征提取方法。该方法和抽象语法树（AST）特征方法相比可以更加精确地提取和漏洞存在直接关系的变量信息，避免引入过多噪声数据，同时可以体现漏洞的语义信息。利用CIS，基于Bi-LSTM和注意力机制提出了一个上下文感知的安卓应用程序漏洞检测模型VulDGArcher；针对安卓漏洞数据集不易获得的问题，构建了一个包含隐式Intent通信漏洞和PendingIntent权限绕过漏洞的41 812个代码片段的数据集，其中漏洞代码片段有16 218个。在这个数据集上，VulDGArcher检测准确率可以达到96%，高于基于AST特征和未进行处理的APP源码特征的深度学习漏洞检测模型。

关键词: 安卓漏洞检测, 深度学习, 代码切片, 漏洞语义特征

Abstract:

The vulnerability detection model of Android application based on learning lacks semantic features.The extracted features contain noise data unrelated to vulnerabilities, which leads to the false positive of vulnerability detection model.A feature extraction method based on code information slice (CIS) was proposed.Compared with the abstract syntax tree (AST) feature method, the proposed method could extract the variable information directly related to vulnerabilities more accurately and avoid containing too much noise data.It contained semantic information of vulnerabilities.Based on CIS and BI-LSTM with attention mechanism, a context-aware Android application vulnerability detection model VulDGArcher was proposed.For the problem that the Android vulnerability data set was not easy to obtain, a data set containing 41 812 code fragments including the implicit Intent security vulnerability and the bypass PendingIntent permission audit vulnerability was built.There were 16 218 code fragments of vulnerability.On this data set, VulDGArcher’s detection accuracy can reach 96%, which is higher than the deep learning vulnerability detection model based on AST features and APP source code features.

Key words: Android vulnerability detection, deep learning, CIS, semantic characteristics of vulnerabilities

中图分类号:

TP18

秦佳伟, 张华, 严寒冰, 何能强, 涂腾飞. 上下文感知的安卓应用程序漏洞检测研究[J]. 通信学报, 2021, 42(11): 13-27.

Jiawei QIN, Hua ZHANG, Hanbing YAN, Nengqiang HE, Tengfei TU. Research on context-aware Android application vulnerability detection[J]. Journal on Communications, 2021, 42(11): 13-27.

图/表 18

图1

表1

图2

表2

表3

表4

表5

图3

表6

图4

图5

表7

图6

图7

表8

图8

图9

图10

参考文献 30

[1]	CHOWDHURY I , ZULKERNINE M . Using complexity,coupling,and cohesion metrics as early indicators of vulnerabilities[J]. Journal of Systems Architecture, 2011,57(3): 294-313.
[2]	YAMAGUCHI F , WRESSNEGGER C , GASCON H ,et al. Chucky:exposing missing checks in source code for vulnerability discovery[C]// Proceedings of the 2013 ACM SIGSAC Conference on Computer ＆ Communications Security. New York:ACM Press, 2013: 499-510.
[3]	赵尚儒, 李学俊, 方越 ,等. 安全漏洞自动利用综述[J]. 计算机研究与发展, 2019,56(10): 73-87.
	ZHANG S R , LI X J , FANG Y et al . An overview of automatic exploitation of security vulnerabilities[J]. Computer Research and Development, 2019,56(10): 73-87.
[4]	GRO S , TIWARI A , HAMMER C . PIAnalyzer:a precise approach for PendingIntent vulnerability analysis[C]// Computer Security. Berlin:Springer, 2018: 41-59.
[5]	过辰楷, 许静, 司冠南 ,等. 面向移动应用软件信息泄露的模型检测研究[J]. 计算机学报, 2016,39(11): 2324-2343.
	GUO C K , XU J , SI G N ,et al. Model checking for software information leakage in mobile application[J]. Chinese Journal of Computers, 2016,39(11): 2324-2343.
[6]	WEI F G , ROY S , OU X M ,et al. Amandroid:a precise and general inter-component data flow analysis framework for security vetting of Android apps[C]// Proceedings of the ACM Conference on Computer and Communications Security. New York:ACM Press, 2014: 1329-1341.
[7]	KLIEBER W , FLYNN L , BHOSALE A ,et al. Android taint flow analysis for app sets[C]// Proceedings of the 3rd ACM SIGPLAN International Workshop on the State of the Art in Java Program Analysis. New York:ACM Press, 2014: 1-6.
[8]	BAGHERI H , SADEGHI A , GARCIA J ,et al. COVERT:compositional analysis of android inter-app permission leakage[J]. IEEE Transactions on Software Engineering, 2015,41(9): 866-886.
[9]	LI L , BARTEL A , BISSYANDé T F ,et al. IccTA:detecting inter-component privacy leaks in android apps[C]// Proceedings of 2015 IEEE/ACM 37th IEEE International Conference on Software Engineering. Piscataway:IEEE Press, 2015: 280-291.
[10]	OCTEAU D , MCDANIEL P , JHA S ,et al. Effective inter-component communication mapping in Android with Epicc:an essential step towards holistic security analysis[C]// Proceedings of the 22nd USENIX Conference on Security. Berkeley:USENIX Association, 2013: 543-558.
[11]	OCTEAU D , LUCHAUP D , DERING M ,et al. Composite constant propagation:application to android inter-component communication analysis[C]// Proceedings of 2015 IEEE/ACM 37th IEEE International Conference on Software Engineering. Piscataway:IEEE Press, 2015: 77-88.
[12]	LEE Y K , BANG J Y , SAFI G ,et al. A SEALANT for inter-app security holes in android[C]// Proceedings of 2017 IEEE/ACM 39th International Conference on Software Engineering (ICSE). Piscataway:IEEE Press, 2017: 312-323.
[13]	王持恒, 陈晶, 苏涵 ,等. 基于宿主权限的移动广告漏洞攻击技术[J]. 软件学报, 2018,29(5): 1392-1409.
	WANG C H , CHEN J , SU H ,et al. Mobile advertising loophole attack technology based on host APP’s permissions[J]. Journal of Software, 2018,29(5): 1392-1409.
[14]	DAM H K , TRAN T , PHAM T ,et al. Automatic feature learning for predicting vulnerable software components[J]. IEEE Transactions on Software Engineering, 2021,47(1): 67-85.
[15]	ZOU D Q , WANG S J , XU S H ,et al. $\mu$μVulDeePecker:a deep learning-based system for multiclass vulnerability detection[J]. IEEE Transactions on Dependable and Secure Computing, 2021,18(5): 2224-2236.
[16]	PERL H , DECHAND S , SMITH M ,et al. VCCFinder:finding potential vulnerabilities in open-source projects to assist code audits[C]// Proceedings of the 22nd ACM SIGSAC Conference on Computer and Communications Security. New York:ACM Press, 2015: 426-437.
[17]	SCANDARIATO R , WALDEN J , HOVSEPYAN A ,et al. Predicting vulnerable software components via text mining[J]. IEEE Transactions on Software Engineering, 2014,40(10): 993-1006.
[18]	BAN X B , LIU S G , CHEN C ,et al. A performance evaluation of deep-learnt features for software vulnerability detection[J]. Concurrency and Computation:Practice and Experience, 2019,31(19): e5103.
[19]	LIN G J , ZHANG J , LUO W ,et al. Cross-project transfer representation learning for vulnerable function discovery[J]. IEEE Transactions on Industrial Informatics, 2018,14(7): 3289-3297.
[20]	WU F , WANG J G , LIU J Q ,et al. Vulnerability detection with deep learning[C]// Proceedings of 2017 3rd IEEE International Conference on Computer and Communications. Piscataway:IEEE Press, 2017: 1298-1302.
[21]	HOVSEPYAN A , SCANDARIATO R , JOOSEN W ,et al. Software vulnerability prediction using text analysis techniques[C]// Proceedings of the 4th International Workshop on Security Measurements and Metrics.[S.l.:s.n.], 2012: 7-10.
[22]	MA S Q , THUNG F , LO D ,et al. VuRLE:automatic vulnerability detection and repair by learning from examples[C]// Computer Security– ESORICS 2017. Berlin:Springer, 2017: 229-246.
[23]	乐洪舟, 张玉清 . 网络直播平台主播地理位置泄露漏洞的分析与利用[J]. 计算机学报, 2019,42(5): 1095-1111.
	YUE H Z , ZHANG Y Q . Vulnerability analysis and exploitation of location privacy leakage in webcasting platforms[J]. Chinese Journal of Computers, 2019,42(5): 1095-1111.
[24]	AVERSANO L , CERULO L , DEL GROSSO C . Learning from bug-introducing changes to prevent fault prone code[C]// Proceedings of Ninth International Workshop on Principles of Software Evolution in Conjunction with the 6th ESEC/FSE Joint Meeting.[S.l.:s.n.], 2007: 19-26.
[25]	GARG S , BALIYAN N . A novel parallel classifier scheme for vulnerability detection in Android[J]. Computers ＆ Electrical Engineering, 2019,77: 12-26.
[26]	CURTSINGER C , LIVSHITS B , ZORN B ,et al. ZOZZLE:fast and precise in-browser JavaScript malware detection[C]// Proceedings of the 20th USENIX Conference on Security. Berkeley:USENIX Association, 2011:3.
[27]	RIECK K , KRUEGER T , DEWALD A . Cujo:efficient detection and prevention of drive-by-download attacks[C]// Proceedings of Proceedings of the 26th Annual Computer Security Applications Conference. New York:ACM Press, 2010: 31-39.
[28]	FASS A , KRAWCZYK R P , BACKES M ,et al. JaSt:fully syntactic detection of malicious (obfuscated) JavaScript[C]// Detection of Intrusions and Malware,and Vulnerability Assessment. Berlin:Springer, 2018: 303-325.
[29]	GENCER K , BA??IFT?I F , . Time series forecast modeling of vulnerabilities in the android operating system using ARIMA and deep learning methods[J]. Sustainable Computing:Informatics and Systems, 2021,30: 100515.
[30]	GRUSKA N , WASYLKOWSKI A , ZELLER A . Learning from 6,000 projects:lightweight cross-project anomaly detection[C]// Proceedings of the 19th International Symposium on Software Testing and Analysis. New York:ACM Press, 2010: 119-130.

方法名	描述
onCreate (Bundle savedInstanceState)	初始化activity 组件
onClick (View v)	用户点击操作调用
onStart ()	当用户将activity 隐藏到后台调用
onCreate ()	初始化service 组件
onStart (Intent intent)	开启service 组件调用
onBind (Intent intent)	开始连接service 组件
onUnbind (Intent intent)	停止与service 组件连接
onRebind (Intent intent)	绑定服务时调用
onReceive (Context curContext, Intent broadcastMsg)	当接收来自其他APP的广播时调用

漏洞名称	存在漏洞/个	安全/个	汇总/个
IIS	1 806	1 722	3 528
PLP	95	471	566
汇总	1 901	2 193	4 094

漏洞名称	存在漏洞/个	安全/个	汇总/个
IIS	16 076	24 658	40 734
PLP	142	936	1 078
汇总	16 218	25 594	41 812

漏洞名称	存在漏洞/个	安全/个	汇总/个
IIS	11 633	17 169	28 802
PLP	133	876	1 009
汇总	11 766	18 045	29 811

数据集		IIS			PLP
数据集	存在漏洞/个	安全/个	汇总/个	存在漏洞/个	安全/个	汇总/个
训练集	11 253	8 630	19 883	100	180	280
验证集	1 607	960	2 567	14	20	34
测试集	3 216	2 400	5 616	28	50	78
汇总	16 076	11 990	28 066	142	250	392

上下文感知的安卓应用程序漏洞检测研究

Research on context-aware Android application vulnerability detection

在线阅读

PDF下载

可视化

摘要/Abstract

引用本文

使用本文

图/表 18

参考文献 30

相关文章 15

Metrics

推荐阅读 0

组	IIS/个	PLP/个	汇总/个
1	2 200	34	2 234
2	4 400	68	4 468
3	6600	102	6 702
4	8 800	136	8 936
5	11 000	170	11 170
6	13 200	204	13 404
7	15 400	238	15 638
8	17 015	275	17 290

模型	F1	FPR	P	TPR	Acc
code block + Bi-LSTM	0.71	0.51	0.74	0.69	0.69
AST + Bi-LSTM	0.84	0.12	0.84	0.84	0.84
CIS + Bi-LSTM (VulGDArcher)	0.98	0.02	0.98	0.98	0.98
code block + CNN	0.70	0.52	0.71	0.71	0.70
AST + CNN	0.85	0.13	0.83	0.83	0.83
CIS + CNN	0.91	0.13	0.92	0.88	0.89

漏洞	IIS	PLP
Marvin-static-Analyzer	0.42	0.72
AndroBugs	0.53	0.85
AST	0.84	0.83
VulDGArcher	0.98	0.96

[1]	陈东昱, 陈华, 范丽敏, 付一方, 王舰. 基于深度学习的随机性检验策略研究[J]. 通信学报, 2023, 44(6): 23-33.
[2]	李荣鹏, 汪丙炎, 张宏纲, 赵志峰. 知识增强的语义通信接收端设计[J]. 通信学报, 2023, 44(6): 70-76.
[3]	马帅, 裴科, 祁华艳, 李航, 曹雯, 王洪梅, 熊海良, 李世银. 基于生成模型的地磁室内高精度定位算法研究[J]. 通信学报, 2023, 44(6): 211-222.
[4]	杨洁, 董标, 付雪, 王禹, 桂冠. 基于轻量化分布式学习的自动调制分类方法[J]. 通信学报, 2022, 43(7): 134-142.
[5]	杨秀璋, 彭国军, 李子川, 吕杨琦, 刘思德, 李晨光. 基于Bert和BiLSTM-CRF的APT攻击实体识别及对齐研究[J]. 通信学报, 2022, 43(6): 58-70.
[6]	廖勇, 王世义. 高速移动环境下基于RM-Net的大规模MIMO CSI反馈算法[J]. 通信学报, 2022, 43(5): 166-176.
[7]	廖育荣, 王海宁, 林存宝, 李阳, 方宇强, 倪淑燕. 基于深度学习的光学遥感图像目标检测研究进展[J]. 通信学报, 2022, 43(5): 190-203.
[8]	赵增华, 童跃凡, 崔佳洋. 基于域自适应的Wi-Fi指纹设备无关室内定位模型[J]. 通信学报, 2022, 43(4): 143-153.
[9]	廖勇, 程港, 李玉杰. 基于深度展开的大规模MIMO系统CSI反馈算法[J]. 通信学报, 2022, 43(12): 77-88.
[10]	段雪源, 付钰, 王坤, 李彬. 基于简单统计特征的LDoS攻击检测方法[J]. 通信学报, 2022, 43(11): 53-64.
[11]	霍俊彦, 邱瑞鹏, 马彦卓, 杨付正. 基于最邻近帧质量增强的视频编码参考帧列表优化算法[J]. 通信学报, 2022, 43(11): 136-147.
[12]	康海燕, 冀源蕊. 基于本地化差分隐私的联邦学习方法研究[J]. 通信学报, 2022, 43(10): 94-105.
[13]	张红霞, 王琪, 王登岳, 王奔. 基于深度学习的区块链蜜罐陷阱合约检测[J]. 通信学报, 2022, 43(1): 194-202.
[14]	晏燕, 丛一鸣, Adnan Mahmood, 盛权政. 基于深度学习的位置大数据统计发布与隐私保护方法[J]. 通信学报, 2022, 43(1): 203-216.
[15]	朱叶, 余宜林, 郭迎春. HRDA-Net：面向真实场景的图像多篡改检测与定位算法[J]. 通信学报, 2022, 43(1): 217-226.