DeepRD：基于Siamese LSTM网络的Android重打包应用检测方法

doi:10.11959/j.issn.1000-436x.2018148

通信学报 ›› 2018, Vol. 39 ›› Issue (8): 69-82.doi: 10.11959/j.issn.1000-436x.2018148

• 论文Ⅰ：人工智能与网络安全 • 上一篇下一篇

DeepRD：基于Siamese LSTM网络的Android重打包应用检测方法

汪润^1,²,唐奔宵^1,²,王丽娜^1,²()

¹ 空天信息安全与可信计算教育部重点实验室，湖北武汉 430072
² 武汉大学国家网络安全学院，湖北武汉 430072

修回日期:2018-06-28 出版日期:2018-08-01 发布日期:2018-09-13
作者简介:汪润（1991-），男，安徽安庆人，武汉大学博士生，主要研究方向为 Android 安全与隐私、AI安全等。|唐奔宵（1991-），男，湖北黄石人，武汉大学博士生，主要研究方向为移动安全与隐私、系统安全等。|王丽娜（1964-），女，辽宁营口人，博士，武汉大学教授、博士生导师，主要研究方向为网络安全、信息隐藏、AI安全等。
基金资助:
国家自然科学基金资助项目(U1536204);中央高校基本科研业务费专项资金资助项目(2042018kf1028);国家高技术研究发展计划（“863”计划）基金资助项目(2015AA016004)

DeepRD:LSTM-based Siamese network for Android repackaged applications detection

Run WANG^1,²,Benxiao TANG^1,²,Li’na WANG^1,²()

¹ Key Laboratory of Aerospace Information Security and Trusted Computing Ministry of Education,Wuhan University,Wuhan 430072,China
² School of Cyber Science and Engineering,Wuhan University,Wuhan 430072,China

Revised:2018-06-28 Online:2018-08-01 Published:2018-09-13
Supported by:
The National Natural Science Foundation of China(U1536204);The Central University Basic Business Expenses Special Funding for Scientific Research Project(2042018kf1028);The National High Technology Research and Development Program of China(2015AA016004)

摘要/Abstract

摘要：

目前，Android 平台重打包应用检测方法依赖于专家定义特征，不但耗时耗力，而且其特征容易被攻击者猜测。另外，现有的应用特征表示难以在常见的重打包应用类型检测中取得良好的效果，导致在实际检测中存在漏报率较高的现象。针对以上2个问题，提出了一种基于深度学习的重打包应用检测方法，自动地学习程序的语义特征表示。首先，对应用程序进行控制流与数据流分析形成序列特征表示；然后，根据词向量嵌入模型将序列特征转变为特征向量表示，输入孪生网络长短期记忆（LSTM,long short term memory）网络中进行程序特征自学习；最后，将学习到的程序特征通过相似性度量实现重打包应用的检测。在公开数据集AndroZoo上测试发现，重打包应用检测的精准率达到95.7%，漏报率低于6.2%。

关键词: 重打包, 深度学习, 孪生网络, 长短期记忆, 安全与隐私

Abstract:

The state-of-art techniques in Android repackaging detection relied on experts to define features,however,these techniques were not only labor-intensive and time-consuming,but also the features were easily guessed by attackers.Moreover,the feature representation of applications which defined by experts cannot perform well to the common types of repackaging detection,which caused a high false negative rate in the real detection scenario.A deep learning-based repackaged applications detection approach was proposed to learn the program semantic features automatically for addressing the above two issues.Firstly,control and data flow analysis were taken for applications to form a sequence feature representation.Secondly,the sequence features were transformed into vectors based on word embedding model to train a Siamese LSTM network for automatically program feature learning.Finally,repackaged applications were detected based on the similarity measurement of learned program features.Experimental results show that the proposed approach achieves a precision of 95.7% and false negative rate of 6.2% in an open sourced dataset AndroZoo.

Key words: repackaging, deep learning, Siamese network, LSTM, security and privacy

中图分类号:

TP309.1

汪润,唐奔宵,王丽娜. DeepRD：基于Siamese LSTM网络的Android重打包应用检测方法[J]. 通信学报, 2018, 39(8): 69-82.

Run WANG,Benxiao TANG,Li’na WANG. DeepRD:LSTM-based Siamese network for Android repackaged applications detection[J]. Journal on Communications, 2018, 39(8): 69-82.

图/表 15

图1

图2

图3

图4

图5

图6

图7

图8

图9

图10

图11

图12

表1

表2

表3

参考文献 50

[1]	ZHOU W , ZHOU Y J , JIANG X X ,et al. Detecting repackaged smartphone applications in third-party Android marketplaces[C]// The Second ACM Conference on Data and Application Security and Privacy. 2012: 317-326.
[2]	卿斯汉 . Android 安全研究进展[J]. 软件学报, 2016,27(1): 45-71.
	QING S H . Research progress on Android security[J]. Journal of Software, 2016,27(1): 45-71.
[3]	文伟平, 梅瑞, 宁戈 ,等. Android 恶意软件检测技术分析和应用研究[J]. 通信学报, 2014,35(8): 78-86.
	WEN W P , MEI R , NING G ,et al. Malware detection technology analysis and applied research of Android platform[J]. Journal on Communications, 2014,35(8): 78-86.
[4]	张玉清, 王凯, 杨欢 ,等. Android安全综述[J]. 计算机研究与发展, 2014,51(7): 1385-1396.
	ZHANG Y Q , WANG K , YANG H ,et al. Survey of Android OS security[J]. Journal of Computer Research and Development, 2014,51(7): 1385-1396.
[5]	张玉清, 方喆君, 王凯 ,等. Android 安全漏洞挖掘技术综述[J]. 计算机研究与发展, 2015,52(10): 2167-2177.
	ZHANG Y Q , FANG Z J , WANG K ,et al. Survey of Android vulnerability detection[J]. Journal of Computer Research and Development, 2015,52(10): 2167-2177.
[6]	杨威, 肖旭生, 李邓锋 ,等. 移动应用安全解析学:成果与挑战[J]. 信息安全学报, 2016,1(2): 1-14.
	YANG W , XIAO X S , LI D F ,et al. Security analytics for mobile apps:achievements and challenges[J]. Journal of Cyber Security, 2016,1(2): 1-14.
[7]	刘新宇, 翁健, 张悦 ,等. 基于 APK 签名信息反馈的 Android 恶意应用检测[J]. 通信学报, 2017,38(5): 190-198.
	LIU X Y , WENG J , ZHANG Y ,et al. Android malware detection based on APK signature information feedback[J]. Journal on Communications, 2017,38(5): 190-198.
[8]	杨欢, 张玉清, 胡予濮 ,等. 基于多类特征的 Android 应用恶意行为检测系统[J]. 计算机学报, 2014,37(1): 15-27.
	YANG H , ZHANG Y Q , HU Y P ,et al. A malware behavior detection system of Android applications based on multi-class features[J]. Chinese Journal of Computers, 2014,37(1): 15-27.
[9]	SADEGHI A , BAGHERI H , GARCIA J ,et al. A taxonomy and qualitative comparison of program analysis techniques for security assessment of Android software[J]. IEEE Transactions on Software Engineering, 2017,43(6): 492-530.
[10]	TIAN K , YAO D D , RYDER B G ,et al. Detection of repackaged Android malware with code-heterogeneity features[J]. IEEE Transactions on Dependable and Secure Computing, 2017,PP(99):1.
[11]	FAN M , LIU J , WANG W ,et al. DAPASA:detecting Android piggybacked apps through sensitive subgraph analysis[J]. IEEE Transactions on Information Forensics and Security, 2017,12(8): 1772-1785.
[12]	LI L , LI D,BISSYANDé T F ,et al. Understanding Android app piggybacking:a systematic study of malicious code grafting[J]. IEEE Transactions on Information Forensics and Security, 2017,12(6): 1269-1284.
[13]	ZHOU W , ZHOU Y J , GRACE M ,et al. Fast,scalable detection of piggybacked mobile applications[C]// The Third ACM Conference on Data and Application Security and Privacy. 2013: 185-196.
[14]	LI L , LI D,BISSYANDé T F , et al . Automatically locating malicious packages in piggybacked Android apps[C]// The 4th International Conference on Mobile Software Engineering and Systems. 2017: 170-174.
[15]	ANDERSON H S , KHARKAR A , FILAR B ,et al. Evading machine learning malware detection[C]// Black Hat USA. 2017.
[16]	DEMONTIS A , MELIS M , BIGGIO B ,et al. Yes,machine learning can be more secure! a case study on Android malware detection[J]. IEEE Transactions on Dependable and Secure Computing, 2017,doi:10.1109/TDSC.2017.2700270.
[17]	YANG W , KONG D , XIE T ,et al. Malware detection in adversarial settings:exploiting feature evolutions and confusions in Android apps[C]// The 33rd Annual Computer Security Applications Conference. 2017: 288-302.
[18]	刘剑, 苏璞睿, 杨珉 ,等. 软件与网络安全研究综述[J]. 软件学报, 2018,29(1): 42-68.
	LIU J , SU P R , YANG M ,et al. Software and cyber security-a survey[J]. Journal of Software, 2018,29(1): 42-68.
[19]	LECUN Y , BENGIO Y , HINTON G . Deep learning[J]. Nature, 2015,521(7553): 436-444.
[20]	HE K , ZHANG X , REN S ,et al. Deep residual learning for image recognition[C]// The IEEE Conference on Computer Vision and Pattern Recognition. 2016: 770-778.
[21]	HINTON G , DENG L , YU D ,et al. Deep neural networks for acoustic modeling in speech recognition:the shared views of four research groups[J]. IEEE Signal Processing Magazine, 2012,29(6): 82-97.
[22]	PENNINGTON J , SOCHER R , MANNING C . Glove:global vectors for word representation[C]// The 2014 Conference on Empirical Methods in Natural Language Processing (EMNLP). 2014: 1532-1543.
[23]	LI Z , ZOU D Q , XU S H ,et al. VulDeePecker:a deep learning-based system for vulnerability detection[C]// NDSS. 2018.
[24]	YUAN Z , LU Y , WANG Z ,et al. Droid-Sec:deep learning in Android malware detection[J]. ACM SIGCOMM Computer Communication Review, 2014,44(4): 371-372.
[25]	WANG S , LIU T , TAN L . Automatically learning semantic features for defect prediction[C]// The 38th International Conference on Software Engineering. 2016: 297-308.
[26]	PRADEL M , SEN K.Deep learning to find bugs . technical report[R]. TU Darmstadt,Department of Computer Science. 2017.
[27]	SHIN E , SONG D , MOAZZEZI R . Recognizing functions in binaries with neural networks[C]// The 24th USENIX Conference on Security Symposium. 2015: 611-626.
[28]	CHUA Z L , SHEN S , SAXENA P ,et al. Neural nets can learn function type signatures from binaries[C]// The 26th USENIX Conference on Security Symposium. 2017: 99-116.
[29]	XU X , LIU C , FENG Q ,et al. Neural network-based graph embedding for cross-platform binary code similarity detection[C]// The 2017 ACM SIGSAC Conference on Computer and Communications Security. 2017: 363-376.
[30]	KOLOSNJAJI B , ZARRAS A , WEBSTER G ,et al. Deep learning for classification of malware system call sequences[C]// Australasian Joint Conference on Artificial Intelligence. 2016: 137-149.
[31]	DAVID O E , NETANYAHU N S . Deepsign:deep learning for automatic malware signature generation and classification[C]// 2015 International Joint Conference on Neural Networks (IJCNN). 2015: 1-8.
[32]	HOU S , SAAS A , CHEN L ,et al. Deep4maldroid:a deep learning framework for Android malware detection based on linux kernel system call graphs[C]// Web Intelligence Workshops (WIW). 2016: 104-111.
[33]	ALLIX K , BISSYANDE T F , KLEIN J ,et al. AndroZoo:collecting millions of Android apps for the research community[C]// The 13th Working Conference on Mining Software Repositories. 2016: 468-471.
[34]	CRUSSELL J , GIBLER C , CHEN H . Attack of the clones:detecting cloned applications on Android markets[C]// European Symposium on Research in Computer Security. 2012: 37-54.
[35]	CRUSSELL J , GIBLER C , CHEN H . Andarwin:scalable detection of semantically similar Android applications[C]// European Symposium on Research in Computer Security. 2013: 182-199.
[36]	CHEN K , LIU P , ZHANG Y . Achieving accuracy and scalability simultaneously in detecting application clones on Android markets[C]// The 36th International Conference on Software Engineering. 2014: 175-186.
[37]	CHEN K , WANG P , LEE Y ,et al. Finding unknown malice in 10 seconds:mass vetting for new threats at the google-play scale[C]// The 24th USENIX Conference on Security Symposium. 2015: 659-674.
[38]	WANG H Y , GUO Y , MA Z ,et al. Wukong:a scalable and accurate two-phase approach to Android app clone detection[C]// The 2015 International Symposium on Software Testing and Analysis. 2015: 71-82.
[39]	王浩宇, 王仲禹, 郭耀 ,等. 基于代码克隆检测技术的 Android 应用重打包检测[J]. 中国科学:信息科学, 2014,44: 142-157.
	WANG H Y , WANG Z Y , GUO Y ,et al. Detecting repackaged Android applications based on code clone detection technique[J]. Science China Information Sciences, 2014,44(1): 142-157.
[40]	ZHANG F , HUANG H , ZHU S ,et al. ViewDroid:towards obfuscation-resilient mobile application repackaging detection[C]// The 2014 ACM conference on Security and Privacy in Wireless ＆ Mobile Networks. 2014: 25-36.
[41]	SUN M , LI M , LUI J . Droideagle:seamless detection of visuallysimilar Android apps[C]// The 8th ACM Conference on Security ＆Privacy in Wireless and Mobile Networks. 2015: 1-9.
[42]	SHAO Y , LUO X , QIAN C ,et al. Towards a scalable resource-driven approach for detecting repackaged Android applications[C]// The 30th Annual Computer Security Applications Conference. 2014: 56-65.
[43]	SOH C , TAN H B K , ARNATOVICH Y L ,et al. Detecting clones in Android applications through analyzing user interfaces[C]// The 2015 IEEE 23rd International Conference on Program Comprehension. 2015: 163-173.
[44]	ZHANG M , DUAN Y , YIN H ,et al. Semantics-aware Android malware classification using weighted contextual API dependency graphs[C]// The 2014 ACM SIGSAC Conference on Computer and Communications Security. 2014: 1105-1116.
[45]	AAFER Y , DU W , YIN H . Droidapiminer:mining api-level features for robust malware detection in Android[C]// International Conference on Security and Privacy in Communication Systems. 2013: 86-103.
[46]	VALLEE-RAI R , HENDREN L J . Jimple:simplifying Java bytecode for analyses and transformations[R]. Technical Report,Sable Group,McGill University,Montreal,Canada, 1998.
[47]	MUELLER J , THYAGARAJAN A . Siamese recurrent architectures for learning sentence similarity[C]// AAAI. 2016: 2786-2792.
[48]	BROMLEY J , GUYON I , LECUN Y ,et al. Signature verification using a “siamese” time delay neural network[C]// Advances in Neural Information Processing Systems. 1994: 737-744.
[49]	LI L , BISSYANDé T F , KLEIN J , ,等. An investigation into the use of common libraries in Android apps[C]// The 23rd International Conference on Software Analysis,Evolution,and Reengineering (SANER). 2016: 403-414.
[50]	汪润, 王丽娜, 唐奔宵 ,等. SPRD:基于应用 UI 和程序依赖图的Android重打包应用快速检测方法[J]. 通信学报, 2018,39(3): 159-171.
	WANG R , WANG L N , TANG B X ,et al. SPRD:fast application repackaging detection approach in Android based on application’s UI and program dependency graph[J]. Journal on Communications, 2018,39(3): 159-171.

方法	FNR	recall	precision	F-score
DeepRD	6.2%	93.8%	95.7%	94.7%
SPRD	12.8%	87.2%	93.3%	90.1%

特征表示	FNR	recall	precision	F-score
语义序列特征	6.2%	93.8%	95.7%	94.7%
原始特征表示	21.1%	78.9%	98.3%	87.5%
API调用特征	16.5%	83.5%	95.5%	89.1%

应用市场	重打包应用比例	国家
Google Play	4.71%	美国
百度	3.2%	中国
小米	5.3%	中国
华为	4.1%	中国
安智	12.6%	中国

DeepRD：基于Siamese LSTM网络的Android重打包应用检测方法

DeepRD:LSTM-based Siamese network for Android repackaged applications detection

在线阅读

PDF下载

可视化

摘要/Abstract

引用本文

使用本文

图/表 15

参考文献 50

相关文章 15

Metrics

推荐阅读 0

[1]	陈东昱, 陈华, 范丽敏, 付一方, 王舰. 基于深度学习的随机性检验策略研究[J]. 通信学报, 2023, 44(6): 23-33.
[2]	李荣鹏, 汪丙炎, 张宏纲, 赵志峰. 知识增强的语义通信接收端设计[J]. 通信学报, 2023, 44(6): 70-76.
[3]	马帅, 裴科, 祁华艳, 李航, 曹雯, 王洪梅, 熊海良, 李世银. 基于生成模型的地磁室内高精度定位算法研究[J]. 通信学报, 2023, 44(6): 211-222.
[4]	杨洁, 董标, 付雪, 王禹, 桂冠. 基于轻量化分布式学习的自动调制分类方法[J]. 通信学报, 2022, 43(7): 134-142.
[5]	张文林, 刘雪鹏, 牛铜, 陈琦, 屈丹. 基于正样本对比与掩蔽重建的自监督语音表示学习[J]. 通信学报, 2022, 43(7): 163-171.
[6]	杨秀璋, 彭国军, 李子川, 吕杨琦, 刘思德, 李晨光. 基于Bert和BiLSTM-CRF的APT攻击实体识别及对齐研究[J]. 通信学报, 2022, 43(6): 58-70.
[7]	廖勇, 王世义. 高速移动环境下基于RM-Net的大规模MIMO CSI反馈算法[J]. 通信学报, 2022, 43(5): 166-176.
[8]	廖育荣, 王海宁, 林存宝, 李阳, 方宇强, 倪淑燕. 基于深度学习的光学遥感图像目标检测研究进展[J]. 通信学报, 2022, 43(5): 190-203.
[9]	赵增华, 童跃凡, 崔佳洋. 基于域自适应的Wi-Fi指纹设备无关室内定位模型[J]. 通信学报, 2022, 43(4): 143-153.
[10]	廖勇, 程港, 李玉杰. 基于深度展开的大规模MIMO系统CSI反馈算法[J]. 通信学报, 2022, 43(12): 77-88.
[11]	段雪源, 付钰, 王坤, 李彬. 基于简单统计特征的LDoS攻击检测方法[J]. 通信学报, 2022, 43(11): 53-64.
[12]	霍俊彦, 邱瑞鹏, 马彦卓, 杨付正. 基于最邻近帧质量增强的视频编码参考帧列表优化算法[J]. 通信学报, 2022, 43(11): 136-147.
[13]	康海燕, 冀源蕊. 基于本地化差分隐私的联邦学习方法研究[J]. 通信学报, 2022, 43(10): 94-105.
[14]	王洪雁, 袁海. 基于骨骼及表观特征融合的动作识别方法[J]. 通信学报, 2022, 43(1): 138-148.
[15]	张红霞, 王琪, 王登岳, 王奔. 基于深度学习的区块链蜜罐陷阱合约检测[J]. 通信学报, 2022, 43(1): 194-202.