通信学报 ›› 2020, Vol. 41 ›› Issue (10): 80-91.doi: 10.11959/j.issn.1000-436x.2020174
李涛,郭渊博,琚安康
修回日期:
2020-07-23
出版日期:
2020-10-25
发布日期:
2020-11-05
作者简介:
李涛(1992- ),男,甘肃甘谷人,信息工程大学博士生,主要研究方向为网络威胁语义建模|郭渊博(1975- ),男,陕西周至人,博士,信息工程大学教授、博士生导师,主要研究方向为大数据安全、态势感知|琚安康(1995- ),男,河南辉县人,信息工程大学博士生,主要研究方向为多步攻击检测、异构安全数据融合
基金资助:
Tao LI,Yuanbo GUO,Ankang JU
Revised:
2020-07-23
Online:
2020-10-25
Published:
2020-11-05
Supported by:
摘要:
针对当前网络安全领域知识获取中所依赖的流水线模式存在实体识别错误的传播,未考虑实体识别与关系抽取任务间的联系,以及模型训练缺乏标签语料的问题,提出一种融合对抗主动学习的端到端网络安全知识三元组抽取方法。首先,将实体识别与关系抽取通过联合标注策略建模为序列标注任务;然后,设计融合动态注意力机制的BiLSTM-LSTM模型实现实体与关系的联合抽取,并形成三元组;最后,基于对抗网络训练一个判别器模型,增量地筛选出高质量的待标注数据进行标注,并通过迭代训练不断提升联合抽取模型的性能。通过实验表明,所提方案中实体-关系联合抽取模型优于现有的网络安全知识抽取方案,并验证了对抗主动学习方法的有效性。
中图分类号:
李涛,郭渊博,琚安康. 融合对抗主动学习的网络安全知识三元组抽取[J]. 通信学报, 2020, 41(10): 80-91.
Tao LI,Yuanbo GUO,Ankang JU. Knowledge triple extraction in cybersecurity with adversarial active learning[J]. Journal on Communications, 2020, 41(10): 80-91.
表4
三元组抽取结果示例"
模型 | 抽取结果 |
示例1 | Since the revelation of an[Adobe Flash Player]e1,hasVulnerabilityzero day exploit exposed as part of the leaked Hacking Team arsenal in 2015 designated[CVE-2015-5119]e2,hasVulnerability. |
Att-PCNN_BiLSTM | Since the revelation of an[Adobe Flash Player]e1 useszero day exploit exposed as part of the leaked Hacking Team arsenal in 2015 designated[CVE-2015-5119]e2 uses. |
BiLSTM-CRF-Multi_head | Since the revelation of an[Adobe Flash Player]e1 hasVulnerabilityzero day exploit exposed as part of the leaked Hacking Team arsenal in 2015 designated[CVE-2015-5119]e2 hasVulnerability. |
Dynamic-att-BiLSTM-LSTM | Since the revelation of an[Adobe Flash Player]e1 hasVulnerabilityzero day exploit exposed as part of the leaked Hacking Team arsenal in 2015 designated[CVE-2015-5119]e2 hasVulnerability. |
示例2 | Apt 28]e1,Mwhich we suspect is sponsored by[Russian]e2,comes-fromgovernment,uses[spear phishing emails]e2,usesto target its victims by specific topics. |
Att-PCNN_BiLSTM | Apt 28]e1,comes-fromwhich we suspect is sponsored by[Russian]e2,comes-fromgovernment,uses[spear phishing emails]to target its victims by specific topics. |
BiLSTM-CRF-Multi_head | Apt 28]e1,comes-fromwhich we suspect is sponsored by[Russian]e2,comes-fromgovernment,uses[spear phishing]emails to target its victims by specific topics. |
Dynamic-att-BiLSTM-LSTM | Apt 28]e1,Mwhich we suspect is sponsored by[Russian]e2,comes-fromgovernment,uses[spear phishing emails]e2,usesto target its victims by specific topics. |
示例3 | One identified malware sample ([75193fc10145931ec0788d7c88fc8832]e1,indicates,compiled in March 2014) uses a password-protected[.7z]e1,located-atto deliver the[Etumbot installer]e2,M,which is most likely contained within[spear phishing email]e2,located-at. |
Att-PCNN_BiLSTM | One identified malware sample ([75193fc10145931ec0788d7c88fc8832]e1,indicates,compiled in March 2014) uses a password-protected[.7z]to deliver the[Etumbot installer]e2,indicates,which is most likely contained within[spear phishing email]. |
BiLSTM-CRF-Multi_head | One identified malware sample ([75193fc10145931ec0788d7c88fc8832]e1,indicates,compiled in March 2014) uses a password-protected[.7z]to deliver the[Etumbot installer]e2,indicates,which is most likely contained within[spear phishing]email. |
Dynamic-att-BiLSTM-LSTM | One identified malware sample ([75193fc10145931ec0788d7c88fc8832]e1,indicates,compiled in March 2014) uses a password-protected[.7z]to deliver the[Etumbot installer]e2,M,which is most likely contained within[spear phishing email]e2,located-at. |
[1] | JOSHI A , LAL R , FININ T ,et al. Extracting cybersecurity related linked data from text[C]// 2013 IEEE Seventh International Conference on Semantic Computing. Piscataway:IEEE Press, 2013: 252-259. |
[2] | 鄂海红, 张文静, 肖思琪 ,等. 深度学习实体关系抽取研究综述[J]. 软件学报, 2019,30(6): 1793-1818. |
E H H , ZHANG W J , XIAO S Q ,et al. Survey of entity relationship extraction based on deep learning[J]. Journal of Software, 2019,30(6): 1793-1818. | |
[3] | PHANDI P , SILVA A , LU W . Semeval-2018 task 8:semantic extraction from cybersecurity reports using natural language processing (SecureNLP)[C]// Proceedings of the 12th International Workshop on Semantic Evaluation.[S.n.:s.l]. 2018: 697-706. |
[4] | SIMRAN K , SRIRAM R , VINAYAKUMAR R ,et al. Deep learning approach for intelligent named entity recognition of cyber security[J]. arXiv Preprint,arXiv:2004.00502, 2020 |
[5] | PINGLE A , PIPLAI A , MITTAL S ,et al. RelExt:relation extraction using deep learning approaches for cybersecurity knowledge graph improvement[C]// Proceedings of the 2019 IEEE/ACM International Conference on Advances in Social Networks Analysis and Mining. Piscataway:IEEE Press, 2019: 879-886. |
[6] | HUANG W , CHENG X , WANG T ,et al. BERT-based multi-head selection for joint entity-relation extraction[C]// CCF International Conference on Natural Language Processing and Chinese Computing. Berlin:Springer, 2019: 713-723. |
[7] | 曹明宇, 杨志豪, 罗凌 ,等. 基于神经网络的药物实体与关系联合抽取[J]. 计算机研究与发展, 2019,56(7): 1432-1440. |
CAO M Y , YANG Z H , LUO L ,et al. Joint drug entities and relations extraction based on neural networks[J]. Journal of Computer Research and Development, 2019,56(7): 1432-1440. | |
[8] | ZHENG S , WANG F , BAO H ,et al. Joint extraction of entities and relations based on a novel tagging scheme[C]// Proceedings of the 55th Association for Computational Linguistics.[S.n.:s.l]. 2017: 1227-1236. |
[9] | LIAO X . Towards automatically evaluating security risks and providing cyber intelligence[D]. Atlanta:Georgia Institute of Technology, 2017. |
[10] | PANWAR A . Toward automatic generation and analysis of indicators of compromise (IoCS) using convolutional neural network[D]. Arizona:Arizona State University, 2017. |
[11] | GASMI H , LAVAL J , BOURAS A . Information extraction of cybersecurity concepts:an LSTM approach[J]. Applied Science, 2019,9(19): 1-15. |
[12] | CHAMBERS N , FRY B , MCMASTERS J . Detecting denial-of-service attacks from social media text:applying nlp to computer security[C]// Proceedings of the North American Chapter of the Association for Computational Linguistics.[S.n.:s.l]. 2018: 1626-1635. |
[13] | ZHOU S , LONG Z , TAN L ,et al. Automatic identification of indicators of compromise using neural-based sequence labelling[J]. arXiv Preprint,arXiv:1810.10156, 2018 |
[14] | LONG Z , TAN L , ZHOU S ,et al. Collecting indicators of compromise from unstructured text of cybersecurity articles using neural-based sequence labelling[C]// 2019 International Joint Conference on Neural Networks (IJCNN). Piscataway:IEEE Press, 2019: 1-8. |
[15] | 秦娅, 申国伟, 赵文波 ,等. 基于深度神经网络的网络安全实体识别方法[J]. 南京大学学报(自然科学), 2019,55(1): 29-40. |
QIN Y , SHEN G W , ZHAO W B ,et al. Research on the method of network security entity recognition based on deep neural network[J]. Journal of Nanjing University(Natural Science), 2019,55(1): 29-40. | |
[16] | 张若彬, 刘嘉勇, 何祥 . 基于BLSTM-CRF模型的安全漏洞领域命名实体识别[J]. 四川大学学报(自然科学版), 2019,56(3): 469-475. |
ZHANG R B , LIU J Y , HE X . Named entity recognition for vulnerabilities based on BLSTM-CRF model[J]. Journal of Sichuan University(Natural Science Edition), 2019,56(3): 469-475. | |
[17] | ZHU J J , BENTO J . Generative adversarial active learning[J]. arXiv Preprint,arXiv:1702.07956v5, 2017 |
[18] | CULOTTA A , MCCALLUM A . Reducing labeling effort for structured prediction tasks[C]// International Conference on Artificial Intelligence. Piscataway:IEEE Press, 2005: 746-751. |
[19] | HOULSBY N , HUSZAR F , GHAHRAMANI Z ,et al. Bayesian active learning for classification and preference learning[J]. arXiv Preprint,arXiv:1112.5745, 2011 |
[20] | GAL Y , GHAHRAMANI Z . Dropout as a Bayesian approximation:representing model uncertainty in deep learning[C]// International Conference on Machine Learning. Piscataway:IEEE Press, 2016: 1050-1059. |
[21] | SENER O , SAVARESE S . Active Learning for convolutional neural networks:a core-set approach[J]. arXiv Preprint,arXiv:1708.00489, 2017 |
[22] | KUO W , HANE C , YUH E L ,et al. Cost-sensitive active learning for intracranial hemorrhage detection[C]// Medical Image Computing and Computer Assisted Intervention. Piscataway:IEEE Press, 2018: 715-723. |
[23] | SHEN Y , YUN H , LIPTON Z C ,et al. Deep active learning for named entity recognition[C]// International Conference on Learning Representations. Piscataway:IEEE Press, 2018: 1-15. |
[24] | CHIU J P C , NICHOLS E . Named entity recognition with bidirectional LSTM-CNNs[J]. Transactions of the Association for Computational Linguistics, 2016,4: 357-370. |
[25] | CAO P , CHEN Y , LIU K ,et al. Adversarial transfer learning for chinese named entity recognition with self-attention mechanism[C]// The 2018 Conference on Empirical Methods in Natural Language Processing. Piscataway:IEEE Press, 2018: 182-192. |
[26] | 程梦, 洪宇, 唐建 ,等. 面向属性抽取的门控动态注意力机制[J]. 模式识别与人工智能, 2019,32(2): 184-192. |
CHENG M , HONG Y , TANG J ,et al. Gated dynamic attention mechanism towards aspect extraction[J]. Pattern Recognition and Artificial Intelligence, 2019,32(2): 184-192. | |
[27] | TIELEMAN T , HINTON G.Lecture 6 . 5-rmsprop,coursera:neural networks for machine learning[R]. University of Toronto,Technical Report, 2012. |
[28] | 张晓斌, 陈福才, 黄瑞阳 . 基于 CNN 和双向 LSTM 融合的实体关系抽取[J]. 网络与信息安全学报, 2018,4(9): 44-51. |
ZHANG X B , CHEN F C , HUANG R Y . Relation extraction based on CNN and BiLSTM[J]. Chinese Journal of Network and Information Security, 2018,4(9): 44-51. | |
[29] | XU Y , MOU L , LI G ,et al. Classifying Relations via long short term memory networks along shortest dependency paths[C]// The 2015 Conference on Empirical Methods in Natural Language Processing. Piscataway:IEEE Press, 2015: 1785-1794. |
[30] | MIWA M , BANSAL M . End-to-end relation extraction using LSTMs on sequences and tree structures[C]// The 54th Annual Meeting of the Association for Computational Linguistics. Piscataway:IEEE Press, 2016: 1105-1116. |
[31] | BEKOULIS G , DELEU J , DEMEESTER T ,et al. Joint entity recognition and relation extraction as a multi-head selection problem[J]. arXiv Preprint,arXiv:1804.07847, 2018 |
[1] | 赵仕祺, 黄小红, 钟志港. 基于信誉的域间路由选择机制的研究与实现[J]. 通信学报, 2023, 44(6): 47-56. |
[2] | 张佳乐, 朱诚诚, 孙小兵, 陈兵. 基于GAN的联邦学习成员推理攻击与防御方法[J]. 通信学报, 2023, 44(5): 193-205. |
[3] | 苏新, 张桂福, 行鸿彦, Zenghui Wang. 基于平衡生成对抗网络的海洋气象传感网入侵检测研究[J]. 通信学报, 2023, 44(4): 124-136. |
[4] | 谢人超, 文雯, 唐琴琴, 刘云龙, 谢高畅, 黄韬. 轨道交通移动边缘计算网络安全综述[J]. 通信学报, 2023, 44(4): 201-215. |
[5] | 徐明, 张保俊, 伍益明, 应晨铎, 郑宁. 面向网络攻击和隐私保护的多智能体系统分布式共识算法[J]. 通信学报, 2023, 44(3): 117-127. |
[6] | 康海燕, 龙墨澜. 基于吸收马尔可夫链攻击图的网络攻击分析方法研究[J]. 通信学报, 2023, 44(2): 122-135. |
[7] | 汤凌韬, 王迪, 刘盛云. 面向非独立同分布数据的联邦学习数据增强方案[J]. 通信学报, 2023, 44(1): 164-176. |
[8] | 刘延华, 李嘉琪, 欧振贵, 高晓玲, 刘西蒙, MENG Weizhi, 刘宝旭. 对抗训练驱动的恶意代码检测增强方法[J]. 通信学报, 2022, 43(9): 169-180. |
[9] | 王延文, 雷为民, 张伟, 孟欢, 陈新怡, 叶文慧, 景庆阳. 基于生成模型的视频图像重建方法综述[J]. 通信学报, 2022, 43(9): 194-208. |
[10] | 郭渊博, 李勇飞, 陈庆礼, 方晨, 胡阳阳. 融合Focal Loss的网络威胁情报实体抽取[J]. 通信学报, 2022, 43(7): 85-92. |
[11] | 李昂, 陈建新, 魏昕, 周亮. 面向6G的跨模态信号重建技术[J]. 通信学报, 2022, 43(6): 28-40. |
[12] | 段雪源, 付钰, 王坤. 基于VAE-WGAN的多维时间序列异常检测方法[J]. 通信学报, 2022, 43(3): 1-13. |
[13] | 向夏雨, 王佳慧, 王子睿, 段少明, 潘鹤中, 庄荣飞, 韩培义, 刘川意. 基于生成对抗网络技术的医疗仿真数据生成方法[J]. 通信学报, 2022, 43(3): 211-224. |
[14] | 陆彦辉, 柳寒, 李航, 朱光旭. 基于多鉴别器生成对抗网络的时间序列生成模型[J]. 通信学报, 2022, 43(10): 167-176. |
[15] | 刘威, 陈成, 江锐, 卢涛. 四通道无监督学习图像去雾网络[J]. 通信学报, 2022, 43(10): 210-222. |
阅读次数 | ||||||
全文 |
|
|||||
摘要 |
|
|||||
|