网络与信息安全学报 ›› 2020, Vol. 6 ›› Issue (6): 1-12.doi: 10.11959/j.issn.2096-109x.2020072
• 专栏:网络应用与防护技术 • 下一篇
修回日期:
2020-09-24
出版日期:
2020-12-15
发布日期:
2020-12-16
作者简介:
张颖君(1982- ),女,山西太原人,博士,中国科学院软件研究所副研究员,主要研究方向为网络信息安全|刘尚奇 (1971- ),男,湖南桃江人,北京市公安局网络安全保卫总队高级工程师,主要研究方向为网络安全|杨牧(1984- ),男,北京人,北京市公安局网络安全保卫总队工程师,主要研究方向为网络安全|张海霞(1981- ),女,河北石家庄人,博士,中国科学院软件研究所高级工程师,主要研究方向为网络信息安全|黄克振(1988- ),男,山东德州人,中国科学院软件研究所工程师,主要研究方向为网络信息安全
基金资助:
Yingjun ZHANG1(),Ushangqi LI2,Mu YANG2,Haixia ZHANG1,Kezhen HUANG1
Revised:
2020-09-24
Online:
2020-12-15
Published:
2020-12-16
Supported by:
摘要:
日志信息是信息系统快速发展中产生的重要信息资源,通过日志的分析,可以进行异常检测、故障诊断和性能诊断等。研究基于日志的异常检测技术,首先对主要使用的基于日志的异常检测框架进行介绍,然后对日志解析、日志异常检测等关键技术进行详细介绍。最后对当前技术进行总结,并对未来研究方向给出建议。
中图分类号:
张颖君,刘尚奇,杨牧,张海霞,黄克振. 基于日志的异常检测技术综述[J]. 网络与信息安全学报, 2020, 6(6): 1-12.
Yingjun ZHANG,Ushangqi LI,Mu YANG,Haixia ZHANG,Kezhen HUANG. Survey on anomaly detection technology based on logs[J]. Chinese Journal of Network and Information Security, 2020, 6(6): 1-12.
表1
日志模板示例 Table 1 Log template samples"
Eid | Name | Message | Frequency | Parameters |
1 | server.ZooKeeperServer | Server environment | 2 | host.name=localhost,user.home=/home/hadoop |
2 | server.NIOServerCnxnFactory | binding to port | 1 | 0.0.0.0/0.0.0.0:2181 |
3 | server.NIOServerCnxnFactory | Accepted socket connection | 1 | 192.168.31.154:38221 |
4 | server.ZooKeeperServer | Client attempting to establish new | 1 | 192.168.31.154:38221 |
session | ||||
5 | server.ZooKeeperServer | Established session | 1 | 0x1621970549a0000 |
… | … | … | … |
表2
日志自动化解析典型方法对比 Table 2 Comparison of typical methods for automatic log analysis"
名称 | 类别 | 使用算法 | 模式 | 准确性 | 时间复杂度 | 效率 |
CFG[ | 代码分析 | AST,CFG | 离线 | ++ | O(n3) | +++ |
PCA[ | 代码分析 | AST | 在线 | +++ | O(n) | +++ |
CLSTR[ | 机器学习 | IPLoM | 离线 | ++ | O(n) | +++ |
LKE[ | 机器学习 | Clustering | 离线 | ++ | O(n2) | + |
Logram[ | 自然语言处理 | n-gram | 在线 | +++ | O(n) | +++ |
NLog[ | 自然语言处理 | POS | 离线 | ++ | O(n) | +++ |
Spell[ | 经典算法 | LCS | 在线 | +++ | O(n) | +++ |
Drain[ | 经典算法 | 解析树 | 在线 | +++ | O(n) | +++ |
表3
日志异常检测方法对比 Table 3 Comparison of log anomaly detection methods"
名称 | 分类 | 使用的算法 | 模式 | 准确率 | 召回率 |
AClog[ | 监督学习 | SVM,LCS | 离线 | 92.4% | 80% |
IM[ | 监督学习 | K-prototype,kNN | 离线 | 89% | 85% |
LogClass[ | 监督学习 | PU,SVM | 在线 | 99.048% | 99.988% |
LogCluster[ | 无监督学习 | 聚类算法 | 在线 | 60% | 36.2% |
MCL[ | 无监督学习 | PCA | 在线 | 99.8% | / |
LACT[ | 无监督学习 | TCA,NLP | 离线 | 97.08% | 95.45% |
CausalConvLSTM[ | 深度学习 | CNN,LSTM | 离线 | 89.59% | 99.72% |
DeepLog[ | 深度学习 | LSTM | 在线 | 95% | 96% |
LogGAN[ | 深度学习 | LSTM | 离线 | 100% | 35.6% |
LogRobust[ | 深度学习 | Bi-LSTM | 在线 | 98% | 100% |
[61] | RISTO V , BERNHARDS B , MARKUS K . An unsupervised framework for detecting anomalous messages from syslog log files[C]// Network Operations and Management Symposium. 2018: 1-6. |
[62] | LOU J G , FU Q , YANG S Q ,et al. Mining invariants from console logs for system problem detection[C]// USENIX Annual Technical Conference. 2010: 1-14. |
[63] | YUAN Y , ANU H , SHI W C ,et al. Learning-based anomaly cause tracing with synthetic analysis of logs from multiple cloud service components[C]// Computer Software and Applications Conference. 2019: 66-71. |
[64] | BIBLOP D , MOHIUDDIN S , MUHAMMADALI G ,et al. LogLens:a real-time log analysis system[C]// International Conference on Distributed Computing Systems. 2018: 1052-1062. |
[65] | DUNIA R , QIN J S . Multi-dimensional fault diagnosis using a subspace approach[C]// ACC. 1997: 1-5. |
[66] | PAPINENI K , . Why inverse document frequency?[C]// NAACL ’01. 2001: 1-8. |
[1] | 廖湘科, 李姗姗 . 大规模软件系统日志研究综述[J]. 软件学报, 2016,27(8): 1934-1947. |
LIAO X K , LI S S . Survey on log research of large scale software system[J]. Journal of Software, 2016,27(8): 1934-1947. | |
[67] | ASTEKIN M , OZCAN S , SOZER H . Incremental analysis of large-scale system logs for anomaly detection[C]// International Conference on Big Data. 2019: 2119-2127. |
[68] | ASTEKIN M , ZENGIN H , S?ZER H . Evaluation of distributed machine learning algorithms for anomaly detection from large-scale system logs:a case study[C]// 2018 IEEE International Conference on Big Data (Big Data). 2018: 2071-2077. |
[2] | OLINER A J , GANAPATHI A , XU W . Advances and challenges in log analysis[J]. Communications of the ACM, 2012,55(2): 55-61. |
[3] | RAPIDS. cyBERT:neural network,that’s the tech; to free your staff from,bad regex[EB]. |
[69] | BROWN A , TUOR A , HUTCHINSON B ,et al. Recurrent neural network attention mechanisms for interpretable system log anomaly detection[C]// MLCS 2018. 2018: 1-8. |
[70] | BERTERO C , ROY M , SAUVANAUD C ,et al. Experience report:log mining using natural language processing and application to anomaly detection[C]// International Symposium on Software Reliability Engineering(2017). 2017: 351-360. |
[4] | MI H , WANG H , ZHOU Y ,et al. Toward finegrained,unsupervised,scalable performance diagnosis for production cloud computing systems[J]. IEEE Trans Parallel Distrib Syst, 2013,24(6): 1245-1255. |
[5] | 崔元, 张琢 . 基于大规模网络日志的模板提取研究[J]. 计算机科学, 2017,44(11A): 448-452. |
[71] | AMEY W , TANISHQ G , ROHIT V ,et al. Hybrid CAE-VAE for unsupervised anomaly detection in log file systems[C]// International Conference on Computing Communication and Networking Technologies. 2019: 1-7. |
[72] | YOON-HO C , PENG L , SHANG Z T ,et al. Using deep learning to solve computer security challenges:a survey[J]. arXiv:Cryptography and Security. 2019. |
[5] | CUI Y , ZHANG Z . Research on Template Extraction Based on Large-scale Network Log[J]. Computer Science, 2017,44(11A): 448-452. |
[6] | HE P , ZHU J , HE S ,et al. An evaluation study on log parsing and its use in log mining[C]// Proc of the 46th Annual IEEE/IFIP International Conference on Dependable Systems and Networks. 2016: 654-661. |
[73] | STEVEN Y , MELODY M , TENG-SHENG M . CausalConvLSTM:semi-supervised log anomaly detection through sequence modeling[C]// International Conference on Machine Learning and Applications. 2019: 1334-1341. |
[74] | DU M , LI F F , VIVEK S . DeepLog:anomaly detection and diagnosis from system logs through deep learning[C]// CCS. 2017: 1285-1298. |
[7] | HE SL , ZHU J M , HE P J ,et al. Experience report:system log analysis for anomaly detection[C]// 27th International Symposium on Software Reliability Engineering. 2016: 207-218. |
[8] | FU Q , LOU J , et al . Contextual analysis of program logs for understanding system behaviors[C]// MSR ’13. 2013: 397-400. |
[9] | PECCHIA A , COTRONEO D , KALBARCZYK Z ,et al. Improving log-based field failure data analysis of multi-node computing systems[C]// Dependable Systems and Networks. 2011: 97-108. |
[75] | XIA B , YIN J J , XU J ,et al. LogGAN:a sequence-based generative adversarial network for anomaly detection based on system logs[C]// SciSec 2019:Science of Cyber Security,Switzerland. 2019: 61-76. |
[76] | ZHANG XU , XU Y , ZHANG H Y ,et al. Robust log-based anomaly detection on unstable log data[C]// ESEC/FSE’19. 2019: 807-817. |
[10] | LU J , LI F , LI L ,et al. CloudRaid:hunting concurrency bugs in the cloud via log-mining[C]// Foundations of Software Engineering, 2018: 3-14. |
[11] | AIT EL HADJ M , KHOUMSI A , BENKAOUZ Y ,et al. Efficient security policy management using suspicious rules through accesslog analysis[J]. Lecture Notes in Computer Science, 2019,11704: 250-266. |
[12] | STUDIAWAN H , FERDOUS S , PAYNE C . A survey on forensic investigation of operating system logs[J]. Digital Investigation, 2019,29: 1-20. |
[77] | WANG X , WANG D , ZHANG Y ,et al. Unsupervised learning for log data analysis based on behavior and attribute features[C]// International Conference on Artificial Intelligence. 2020: 510-518. |
[78] | 梅御东, 陈旭, 孙毓忠 ,等. 一种基于日志信息和CNN-text的软件系统异常检测方法[J]. 计算机学报, 2020,43(2): 366-380. |
MEI Y D , CHEN X , SUN Y Z ,et al. A method for software system anmaly detection based on log information and CNN-Text[J]. Chinese Journal of Computers, 2020,43(2): 366-380. | |
[79] | LU S Y , WEI X , LI Y D ,et al. Detecting anomaly in big data system logs using convolutional neural network[C]// Dependable Autonomic and Secure Computing. 2018: 151-158. |
[13] | CHOW M , MEISNER D , FLINN J ,et al. The mystery machine:end-to-end performance analysis of large-scale internet services[C]// 11th USENIX Symposium on Operating Systems Design and Implementation (OSDI ’14). 2014: 217-231. |
[14] | KARTHIK N , CHARLES K , JENNIFER N . Structured comparative analysis of systems logs to diagnose performance problems[C]// Networked Systems Design and Implementation. 2012: 26-26. |
[15] | CHANDOLA V , BANERJEE A , KUMAR V . anomaly detection:a survey[J]. ACM Computing Surveys, 2009,41(3): 30602-30611. |
[16] | KULKARNI J , JOSHI S , BAPAT S ,et al. Analysis of system logs for pattern detection and anomaly prediction[C]// Proceeding of International Conference on Computational Science and Applications. 2020: 427-436. |
[17] | ARIEL R , RANDY K . Chukwa:a system for reliable large-scale log collection[C]// Usenix Large Installation Systems Administration Conference. 2010: 1-15. |
[18] | ZHU J M , HE S L , LIU J Y ,et al. Tools and benchmarks for automated log parsing[C]// International Conference on Software Engineering. 2019: 121-130. |
[19] | Splunk[EB]. |
[20] | Logentries[EB]. |
[21] | Logz.io[EB]. |
[22] | DU M , LI F F . Spell:streaming parsing of system event logs[C]// ICDM 2016. 2016: 859-864. |
[23] | XU W , HUANG L , FOX A ,et al. Detecting large-scale system problems by mining console logs[C]// Symposium on Operating Systems Principles, 2009: 117-132. |
[24] | NAGAPPAN M , WU K , MLADEN A . Efficiently extracting operational profiles from execution logs using suffix arrays[C]// ISSRE. 2009: 41-50. |
[25] | BAO L , LI Q , LU P Y ,et al. Execution anomaly detection in large-scale systems through console log analysis[J]. Journal of Systems and Software, 2018,143: 172-186. |
[26] | LONVICK C , . The BSD syslog protocol[EB]. |
[27] | VAARANDI R , . Mining event logs with slct and loghound[C]// Proceedings of the 2008 IEEE/IFIP Network Operations and Management Symposium. 2008: 1071-1074. |
[28] | RISTO V , PIHELGAS M . LogCluster — a data clustering and pattern mining algorithm for event logs[C]// Conference on Network and Service Management (CNSM). 2015: 1-7. |
[29] | MAKANJU A , ZINCIR-HEYWOOD N , MILIOS E E . A lightweight algorithm for message type extraction in system application logs[J]. IEEE Transactions on Knowledge and Data Engineering, 2012,24(11): 1921-1936. |
[30] | TATSUAKI K , KEISUKE I , TATSUYA Mori ,et al. Spatio-temporal factorization of log data for understanding network events[C]// IEEE INFOCOM 2014. 2014: 610-618. |
[31] | FU Q , LOU J G , WANG Y ,et al. Execution anomaly detection in distributed systems through unstructured log analysis[C]// (ICDM’09)Proc of International Conference on Data Mining. 2009: 149-158. |
[32] | TANG L , LI T , PERNG C S . LogSig:generating system events from raw textual logs[C]// CIKM’11:Proc.of ACM International Conference on Information and Knowledge Management. 2011. 785-794. |
[33] | HE P J , ZHU J M , HE S L ,et al. Towards automated log parsing for large-scale log data analysis[J]. IEEE Transactions on Dependable and Secure Computing, 2018,15(6): 931-944. |
[34] | STUDIAWAN H , SOHEL F , PAYNE C . Automatic event log abstraction to support forensic investigation[C]// ACSW 2020. 2020: 1-9. |
[35] | STUDIAWAN H , PAYNE C , SOHEL F . Automatic graph-based clustering for security logs[C]// Advanced Information Networking and Applications(AINA). 2019: 914-926. |
[36] | DAI H , LI H , CHEN C S ,et al. Logram:efficient log parsing using n-gram dictionaries[R]. 2020. |
[37] | NICOLAS A , YOHAN P , SOPHIE C ,et al. Improving performances of log mining for anomaly prediction through NLP-based log parsing[C]// Modeling Analysis And Simulation on Computer and Telecommunication Systems. 2018: 237-243. |
[38] | Li G F , ZHU P J , CAO N ,et al. Improving the system log analysis with language model and semi-supervised classifier[J]. Multimedia Tools and Applications, 2019,78(15): 21521-21535. |
[39] | PI A D , CHEN W , ZELLER W ,et al. It can understand the logs,literally[C]// International Parallel and Distributed Processing Symposium. 2019: 446-451. |
[40] | LIU W Y , LIU X , DI X Q ,et al. FastlogSim:a quick log pattern parser scheme based on text similarity[C]// Knowledge Science Engineering and Management. 2020: 211-219. |
[41] | DU M , LI F F . Spell:online streaming parsing of large unstructured system logs[J]. IEEE Transactions on Knowledge and Data Engineering, 2019,31(11): 2213-2227. |
[42] | MESSAOUDI S , PANICHELLA A , BIANCULLI D ,et al. A search-based approach for accurate identification of log message formats[C]// ICPC. 2018: 167-177. |
[43] | HE P J , ZHU J M , ZHENG Z B ,et al. Drain:an online log parsing approach with fixed depth tree[C]// ICWS. 2017: 33-40. |
[44] | BAO L F , BUSANY N , LO D ,et al. Statistical log differencing[C]// Automated Software Engineering. 2019: 851-862. |
[45] | SIDDHARTHA S , SUPRATIM D , SRIKANT R ,et al. Learning latent events from network message logs[J]. IEEE ACM Transactions on Networking, 2019,27(4): 1728-1741. |
[46] | XIE X S , WANG Z , XIAO X H ,et al. A confidence-guided evaluation for log parsers inner quality[J]. Mobile Networks and Applications, 2020: 1-12. |
[47] | ZHANG D X , ZHENG Y , WEN Y ,et al. Role-based log analysis applying deep learning for insider threat detection[C]// SecArch'18. 2018: 18-20. |
[48] | EL-MASRIA D , PETRILLOB F , YANN-GA?L G ,et al. A systematic literature review on automated log abstraction techniques[J]. Information & Software Technology, 2020,22: 1-18. |
[49] | OPREA A , LI Z , YEN T F ,et al. Detection of early-stage enterprise infection by mining large-scale log data[C]// Proceedings of the 45th Annual IEEE/IFIP International Conference on Dependable Systems and Networks (DSN). 2015: 45-56. |
[50] | MARCELLO C , DOMENICO C , ANTONIO P . Event logs for the analysis of software failures:a rule-based approach[J]. IEEE Transactions on Software Engineering, 2013,39(6): 806-821. |
[51] | LOU J G , FU Q , YANG S G ,et al. Mining program workflow from interleaved traces[C]// Proceedings of the 16th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. 2010: 613-622. |
[52] | BEZERRA F , WAINER J . Algorithms for anomaly detection of traces in logs of process aware information systems[J]. Information Systems, 2013,38(1): 33-44. |
[53] | JIA T , CHEN G , YANG L ,et al. An approach for anomaly diagnosis based on hybrid graph model with logs for distributed services[C]// Proceedings of the IEEE International Conference on Web Services (ICWS). 2017: 25-32. |
[54] | LI T , MA J F , PEI Q Q ,et al. AClog:attack chain construction based on log correlation[C]// Global Communications Conference. 2019: 1-6. |
[55] | MENG W B , LIU Y , ZHANG S L ,et al. Device-agnostic log anomaly classification with partial labels[C]// International Workshop on Quality of Service. 2018: 1-6. |
[56] | LIU Z L , QIN T , GUAN X H ,et al. An integrated method for anomaly detection from massive system logs[J]. IEEE Access, 2018: 30602-30611. |
[57] | XU W , HUANG L , ATREJA S ,et al. Online system problem detection by mining patterns of console logs[C]// ICDM’09. 2009: 588-597. |
[58] | NANDI A , MANDAL A , ATREJA S ,et al. Anomaly detection using program control flow graph mining from execution logs[C]// KDD 2016. 2016: 215-224. |
[59] | LIN Q W , ZHANG H Y , LOU J G ,et al. Log clustering based problem identification for online service systems[C]// ICSE 2016. 2016: 1-10. |
[60] | LIU F C , WEN Y , ZHANG D X ,et al. Log2vec:a heterogeneous graph embedding based approach for detecting cyber threats within enterprise[C]// CCS’19. 2019: 1777-1794. |
[1] | 夏锐琪, 李曼曼, 陈少真. 基于机器学习的分组密码结构识别[J]. 网络与信息安全学报, 2023, 9(3): 79-89. |
[2] | 曹艺怀, 陈伟, 张帆, 吴礼发. 面向高速网络流量的加密混淆型WebShell检测[J]. 网络与信息安全学报, 2022, 8(4): 119-130. |
[3] | 韦南, 殷丽华, 宁洪, 方滨兴. 本科“机器学习”课程教学改革初探[J]. 网络与信息安全学报, 2022, 8(4): 182-189. |
[4] | 黄诚, 孙明旭, 段仁语, 吴苏晟, 陈斌. 面向项目版本差异性的漏洞识别技术研究[J]. 网络与信息安全学报, 2022, 8(1): 52-62. |
[5] | 石灏苒, 吉立新, 刘树新, 王庚润. 基于半局部结构的异常连边识别算法[J]. 网络与信息安全学报, 2022, 8(1): 63-72. |
[6] | 赵普, 赵文涛, 付章杰, 刘强. 基于Renyi熵的SDN自主防护系统[J]. 网络与信息安全学报, 2021, 7(3): 85-94. |
[7] | 付溪,李晖,赵兴文. 网络钓鱼识别研究综述[J]. 网络与信息安全学报, 2020, 6(5): 1-10. |
[8] | 何康,祝跃飞,刘龙,芦斌,刘彬. 敌对攻击环境下基于移动目标防御的算法稳健性增强方法[J]. 网络与信息安全学报, 2020, 6(4): 67-76. |
[9] | 袁福祥,刘粉林,刘翀,刘琰,罗向阳. MLAR:面向IP定位的大规模网络别名解析[J]. 网络与信息安全学报, 2020, 6(4): 77-94. |
[10] | 赵淦森,谢智健,王欣明,何嘉浩,张成志,林成创,ZihengZhou,陈冰川,ChunmingRong. ContractGuard:面向以太坊区块链智能合约的入侵检测系统[J]. 网络与信息安全学报, 2020, 6(2): 35-55. |
[11] | 骆子铭,许书彬,刘晓东. 基于机器学习的TLS恶意加密流量检测方案[J]. 网络与信息安全学报, 2020, 6(1): 77-83. |
[12] | 黄伟,刘存才,祁思博. 针对设备端口链路的LSTM网络流量预测与链路拥塞方案[J]. 网络与信息安全学报, 2019, 5(6): 50-57. |
[13] | 王易东, 刘培顺, 王彬. 基于深度学习的系统日志异常检测研究[J]. 网络与信息安全学报, 2019, 5(5): 105-118. |
[14] | 宋蕾, 马春光, 段广晗. 机器学习安全及隐私保护研究进展[J]. 网络与信息安全学报, 2018, 4(8): 1-11. |
[15] | 明拓思宇, 陈鸿昶. 文本摘要研究进展与趋势[J]. 网络与信息安全学报, 2018, 4(6): 1-10. |
阅读次数 | ||||||
全文 |
|
|||||
摘要 |
|
|||||
|