网络与信息安全学报 ›› 2023, Vol. 9 ›› Issue (4): 40-52.doi: 10.11959/j.issn.2096-109x.2023052

• 学术论文 • 上一篇    

基于词向量和图卷积的攻击模式与技术实体关联方法

裘炜程1,2, 陈秀真1,2, 马颖华1,2, 马进1,2, 周志洪1,2   

  1. 1 上海交通大学网络安全技术研究院,上海 200240
    2 上海市信息安全综合管理技术重点实验室,上海 200240
  • 修回日期:2023-05-06 出版日期:2023-08-01 发布日期:2023-08-01
  • 作者简介:裘炜程(1997- ),男,上海人,上海交通大学硕士生,主要研究方向为车联网信息安全、网络安全、人工智能
    陈秀真(1977- ),女,山东聊城人,博士,上海交通大学副教授,主要研究方向为车联网信息安全、网络信息系统安全检测与评估、社交网络大数据分析
    马颖华(1973- ),女,山东泰安人,博士,上海交通大学讲师,主要研究方向为网络内容安全、自然语言处理
    马进(1977- ),女,山东滕州人,博士,上海交通大学高级工程师,主要研究方向为大数据与人工智能应用、车联网信息安全、网络空间安全综合管理新技术
    周志洪(1979- ),男,江西九江人,上海交通大学讲师,主要研究方向为密码应用、安全测评、车联网安全
  • 基金资助:
    国家自然科学基金(U2003206);上海市科学技术委员会科技创新行动计划(22511101202)

Predicting correlation relationships of entities between attack patterns and techniques based on word embedding and graph convolutional network

Weicheng QIU1,2, Xiuzhen CHEN1,2, Yinghua MA1,2, Jin MA1,2, Zhihong ZHOU1,2   

  1. 1 Institute of Cyber Science and Technology, Shanghai Jiao Tong University, Shanghai 200240, China
    2 Shanghai Municipal Key Lab of Integrated Management Technology for Information Security, Shanghai 200240, China
  • Revised:2023-05-06 Online:2023-08-01 Published:2023-08-01
  • Supported by:
    The National Natural Science Foundation of China(U2003206);Shanghai Science and Technology Com-mittee Science and Technology Innovation Action Plan(22511101202)

摘要:

安全威胁分析以包含大量安全实体的网络安全知识库为基础,对威胁源、攻击能力、攻击动机、威胁路径等建模,并结合资产的脆弱性及已部署的安全措施,评估威胁的影响范围、程度以及带来的安全风险。但是基础知识库部分实体之间的关联关系缺失,不利于实现安全事件追踪、攻击路径关联。针对 CAPEC 攻击模式和ATT&CK技术的实体关系缺失问题,提出一种基于词向量和图卷积的攻击模式与技术实体关联方法,即通过分析实体描述文本之间的关联度判断实体关系,达到不同知识库间实体关系的补充、丰富威胁路径的目的。提出的方法使用预训练的安全领域Word2Vec模型提取特定领域的语义信息,通过图卷积神经网络模型提取实体描述间的共现关系,进一步将两种特征输入孪生网络预测实体间的关联关系。为解决现有知识库中实体关系较少的小样本学习问题,通过词向量补充外部语义信息、模型训练时使用动态负采样和添加正则项避免过拟合。针对MITRE组织提供的CAPEC和ATT&CK知识库进行实验测试,结果表明,所提算法可以通过分析实体描述的语义信息,将有关系的实体对在样本空间内与无关系实体对分离,从而有效预测新的实体对的关联关系。在小样本条件下,所提方法的预测准确率高于基于Bert 的文本相似度预测方法,同时训练时间更短、所需计算资源更少。实验结果证明,基于词向量和图卷积的攻击模式与技术实体关联方法可以挖掘出新的攻击模式与技术实体关联关系,安全威胁分析时有助于提高从安全漏洞和脆弱点等底层概念抽象出攻击技术和战术的能力。

关键词: 安全实体关联, 自然语言处理, 图卷积神经网络, 小样本学习

Abstract:

Threat analysis relies on knowledge bases that contain a large number of security entities.The scope and impact of security threats and risks are evaluated by modeling threat sources, attack capabilities, attack motivations, and threat paths, taking into consideration the vulnerability of assets in the system and the security measures implemented.However, the lack of entity relations between these knowledge bases hinders the security event tracking and attack path generation.To complement entity relations between CAPEC and ATT&CK techniques and enrich threat paths, an entity correlation prediction method called WGS was proposed, in which entity descriptions were analyzed based on word embedding and a graph convolution network.A Word2Vec model was trained in the proposed method for security domain to extract domain-specific semantic features and a GCN model to capture the co-occurrence between words and sentences in entity descriptions.The relationship between entities was predicted by a Siamese network that combines these two features.The inclusion of external semantic information helped address the few-shot learning problem caused by limited entity relations in the existing knowledge base.Additionally, dynamic negative sampling and regularization was applied in model training.Experiments conducted on CAPEC and ATT&CK database provided by MITRE demonstrate that WGS effectively separates related entity pairs from irrelevant ones in the sample space and accurately predicts new entity relations.The proposed method achieves higher prediction accuracy in few-shot learning and requires shorter training time and less computing resources compared to the Bert-based text similarity prediction models.It proves that word embedding and graph convolutional network based entity relation prediction method can extract new entity correlation relationships between attack patterns and techniques.This helps to abstract attack techniques and tactics from low-level vulnerabilities and weaknesses in security threat analysis.

Key words: security entity correlation, natural language processing, graph convolution neural network, few-shot learning

中图分类号: 

No Suggested Reading articles found!