网络与信息安全学报 ›› 2023, Vol. 9 ›› Issue (3): 102-112.doi: 10.11959/j.issn.2096-109x.2023042

• 学术论文 • 上一篇    下一篇

基于操作注意力和数据增强的内部威胁检测

冯冠云, 付才, 吕建强, 韩兰胜   

  1. 1 分布式系统安全湖北省重点实验室,湖北省大数据安全工程技术研究中心,湖北 武汉 430074
    2 华中科技大学网络空间安全学院,湖北 武汉 430074
  • 修回日期:2023-03-15 出版日期:2023-06-25 发布日期:2023-06-01
  • 作者简介:冯冠云(1998- ),男,湖北武汉人,华中科技大学硕士生,主要研究方向为网络行为分析、深度学习
    付才(1976- ),男,湖北通城人,华中科技大学教授、博士生导师,主要研究方向为移动网络安全、系统与软件安全、网络行为分析
    吕建强(1981- ),男,湖北黄冈人,华中科技大学博士生,主要研究方向为系统与软件安全、网络行为分析
    韩兰胜(1972- ),男,山东济宁人,华中科技大学教授、博士生导师,主要研究方向为网络攻防、系统与软件安全、网络行为分析
  • 基金资助:
    国家自然科学基金(62072200);国家自然科学基金(62172176);国家重点研发计划(2022YFB3103400)

Insider threat detection based on operational attention and data augmentation

Guanyun FENG, Cai FU, Jianqiang LYU, Lansheng HAN   

  1. 1 Hubei Engineering Research Center on Big Data Security, Hubei Key Laboratory of Distributed System Security, Wuhan, 430074, China
    2 School of Cyber Science and Engineering, Huazhong University of Science and Technology, Wuhan 430074, China
  • Revised:2023-03-15 Online:2023-06-25 Published:2023-06-01
  • Supported by:
    The National Natural Science Foundation of China(62072200);The National Natural Science Foundation of China(62172176);The National Key R & D Program of China(2022YFB3103400)

摘要:

内部威胁是组织中出现重大安全隐患的主要原因之一,也是一个长期的挑战。通过分析现有的内部威胁数据, 指出内部威胁检测最大的挑战在于数据不平衡、有标注的威胁样本少。内部威胁检测的经典数据集CMU-C R4.2共有322万条日志数据,其中标记出的恶意操作日志仅7 423条;日志中的大多数操作类型与恶意行为无关,如泄露企业数据这一恶意行为仅与两种类型操作高度相关,而其余的 40 多种类型操作的日志则可能对检测造成干扰。针对这一挑战,设计了一种基于操作注意力和数据增强的数据处理框架。该框架首先对操作进行异常评估,对低异常评分的操作进行掩码操作,使模型更好地关注与恶意行为相关的操作,可以被认为是一种操作的硬注意力机制。通过分析内部威胁数据集的特点,设计了3种规则对恶意样本进行数据增强,以增加样本的多样性和缓解正负样本严重不平衡的问题。将有监督的内部威胁检测视作一个时序分类问题,在长短期记忆卷积神经网络(LSTM-FCN)模型中加入残差连接以实现多粒度的检测,并使用精确率、召回率等指标实施评估,要优于现有的基线模型;另外,在 ITD-Bert、TextCNN 等多种经典模型上实施基于操作注意力和数据增强的数据处理框架,结果表明所提方法能够有效提升内部威胁检测模型的性能。

关键词: 内部威胁检测, 硬注意力, 数据增强, 神经网络

Abstract:

In recent years, there has been an increased focus on the issue of insider threats.Insider threats are a major cause security breaches in organizations and pose an ongoing challenge.By analyzing the existing insider threat data, it was identified that the biggest challenge in insider threat detection lies in data imbalance and the limited number of labeled threat samples.In the Cert R4.2 dataset, which is a classic dataset for insider threat detection, there are over 3.22 million log data, but only 7,423 are marked as malicious operation logs.Furthermore, most of the operation types in the logs are not related to malicious behavior, and only two types of operations are highly correlated with malicious behavior, such as leaking company data, creating interference in the detection process.To address this challenge, a data processing framework was designed based on operational attention and data augmentation.Anomaly evaluation was first performed on operations by the framework, and operations with low anomaly scores were then masked.This makes the model better focus on operations related to malicious behavior, which can be considered as a hard attention mechanism for operations.Next, the characteristics of the insider threat dataset were analyzed, and three rules were designed for data augmentation on malicious samples to increase the diversity of samples and alleviate the substantial imbalance between positive and negative samples.Supervised insider threat detection was regarded as a time-series classification problem.Residual connections were added to the LSTM-FCN model to achieve multi-granularity detection, and indicators such as precision rate and recall rate were used to evaluate the model.The results indicate superior performance over existing baseline models.Moreover, the data processing framework was implemented on various classic models, such as ITD-Bert and TextCNN, and the results show that the methods effectively improve the performance of insider threat detection models.

Key words: Insider threat detection, hard attention, data augmentation, neural network

中图分类号: 

No Suggested Reading articles found!