网络与信息安全学报 ›› 2020, Vol. 6 ›› Issue (5): 126-138.doi: 10.11959/j.issn.2096-109x.2020009

• 学术论文 • 上一篇    

基于残差空洞卷积神经网络的网络安全实体识别方法

谢博1,2,申国伟1,2(),郭春1,2,周燕1,2,于淼3   

  1. 1 贵州大学计算机科学与技术学院,贵州 贵阳 550025
    2 贵州省公共大数据重点实验室,贵州 贵阳 550025
    3 中国科学院信息工程研究所,北京 100093
  • 修回日期:2020-01-07 出版日期:2020-10-01 发布日期:2020-10-19
  • 作者简介:谢博(1996- ),男,云南昭通人,贵州大学硕士生,主要研究方向为网络安全、知识图谱、数据挖掘|申国伟(1986- ),男,湖南邵东人,博士,贵州大学副教授,主要研究方向为大数据、网络与信息安全、数据挖掘|郭春(1986- ),男,贵州贵阳人,博士,贵州大学副教授,主要研究方向为网络安全|周燕(1980- ),女,贵州贵阳人,贵州大学讲师,主要研究方向为密码学与网络安全|于淼(1987- ),男,黑龙江牡丹江人,博士,中国科学院信息工程研究所高级工程师,主要研究方向为网络与信息安全、数据挖掘
  • 基金资助:
    国家自然科学基金(61802081);贵州省自然科学基金(20161052);贵州省自然科学基金(20167428);贵州省自然科学基金(20171051);贵州省科技重大专项计划基金(20183001)

Cyber security entity recognition method based on residual dilation convolution neural network

Bo XIE1,2,Guowei SHEN1,2(),Chun GUO1,2,Yan ZHOU1,2,Miao YU3   

  1. 1 College of Computer Science and Technology,Guizhou University,Guiyang 550025,China
    2 Guizhou Provincial Key Laboratory of Public Big Data,Guiyang 550025,China
    3 Institute of Information Engineering,Chinese Academy of Sciences,Beijing 100093,China
  • Revised:2020-01-07 Online:2020-10-01 Published:2020-10-19
  • Supported by:
    The National Natural Science Foundation of China(61802081);The Natural Science Foundation of Guizhou Province,China(20161052);The Natural Science Foundation of Guizhou Province,China(20167428);The Natural Science Foundation of Guizhou Province,China(20171051);The Major Scientific and Technological Special Project of Guizhou Province,China(20183001)

摘要:

近年来,网络安全威胁日益增多,数据驱动的安全智能分析成为网络安全领域研究的热点。特别是以知识图谱为代表的人工智能技术可为多源异构威胁情报数据中的复杂网络攻击检测和未知网络攻击检测提供支撑。网络安全实体识别是威胁情报知识图谱构建的基础。开放网络文本数据中的安全实体构成非常复杂,导致传统的深度学习方法难以准确识别。在BERT(pre-training of deep bidirectional transformers)预训练语言模型的基础上,提出一种基于残差空洞卷积神经网络和条件随机场的网络安全实体识别模型 BERT-RDCNN-CRF。通过BERT模型训练字符级特征向量表示,结合残差卷积与空洞神经网络模型有效提取安全实体的重要特征,最后通过CRF获得每一个字符的BIO标注。在所构建的大规模网络安全实体标注数据集上的实验表明,所提方法取得了比LSTM-CRF模型、BiLSTM-CRF模型和传统的实体识别模型更好的效果。

关键词: 网络安全, 实体识别, 残差连接, 空洞卷积神经网络, BERT预训练模型

Abstract:

In recent years,cybersecurity threats have increased,and data-driven security intelligence analysis has become a hot research topic in the field of cybersecurity.In particular,the artificial intelligence technology represented by the knowledge graph can provide support for complex cyberattack detection and unknown cyberattack detection in multi-source heterogeneous threat intelligence data.Cybersecurity entity recognition is the basis for the construction of threat intelligence knowledge graphs.The composition of security entities in open network text data is very complex,which makes traditional deep learning methods difficult to identify accurately.Based on the pre-training language model of BERT (pre-training of deep bidirectional transformers),a cybersecurity entity recognition model BERT-RDCNN-CRF based on residual dilation convolutional neural network and conditional random field was proposed.The BERT model was used to train the character-level feature vector representation.Combining the residual convolution and the dilation neural network model to effectively extract the important features of the security entity,and finally obtain the BIO annotation of each character through CRF.Experiments on the large-scale cybersecurity entity annotation dataset constructed show that the proposed method achieves better results than the LSTM-CRF model,the BiLSTM-CRF model and the traditional entity recognition model.

Key words: cybersecurity,entity recognition, residual connection, dilation convolution neural network, BERT pre-train model

中图分类号: 

No Suggested Reading articles found!