通信学报 ›› 2020, Vol. 41 ›› Issue (2): 165-175.doi: 10.11959/j.issn.1000-436x.2020033

• 学术论文 • 上一篇    下一篇

信息安全领域内实体共指消解技术研究

张晗1,2,胡永进1,郭渊博1,陈吉成3   

  1. 1 信息工程大学密码工程学院,河南 郑州 450001
    2 郑州大学软件学院,河南 郑州 450000
    3 信息工程大学信息技术研究所,河南 郑州 450001
  • 修回日期:2019-12-27 出版日期:2020-02-25 发布日期:2020-03-09
  • 作者简介:张晗(1985- ),女,河南项城人,信息工程大学博士生,主要研究方向为自然语言处理、信息安全|胡永进(1981- ),男,山东潍坊人,信息工程大学讲师,主要研究方向为主动防御、态势感知|郭渊博(1975- ),男,陕西周至人,博士,信息工程大学教授、博士生导师,主要研究方向为大数据安全、态势感知|陈吉成(1984- ),男,江苏涟水人,信息工程大学博士生,主要研究方向为复杂网络、信息内容安全
  • 基金资助:
    国家自然科学基金资助项目(61501515);河南省重点科技攻关基金资助项目(172102210002);郑州大学青年骨干教师基金资助项目(2017ZDGGJS048)

Research on coreference resolution technology of entity in information security

Han ZHANG1,2,Yongjin HU1,Yuanbo GUO1,Jicheng CHEN3   

  1. 1 Department of Cryptogram Engineering,Information Engineering University,Zhengzhou 450001,China
    2 Software College,Zhengzhou University,Zhengzhou 450000,China
    3 Institute of information technology,Information Engineering University,Zhengzhou 450001,China
  • Revised:2019-12-27 Online:2020-02-25 Published:2020-03-09
  • Supported by:
    The National Natural Science Foundation of China(61501515);The Project of Henan Provincial Key Scientific and Technology(172102210002);The Young Scholar teachers project of Zhengzhou University(2017ZDGGJS048)

摘要:

针对信息安全领域内的共指消解问题,提出了一个混合型方法。该方法在原来BiLSTM-attention-CRF模型的基础上引入领域词典匹配机制,将其与文档层面的注意力机制相结合,作为一种新的基于字典的注意力机制,来解决从文本中提取候选词时对稀有实体以及长度较长的实体识别能力稍弱的问题,并通过总结领域文本特征,将提取出的待消解候选词根据词性分别采用规则与机器学习的方式进行消解,以提高准确性。通过在安全领域数据集的实验,分别从共指消解以及提取候选词并分类2个方面证明了方法的优越性。

关键词: 共指消解, 混合型方法, 领域词典匹配机制, BiLSTM-attention-CRF模型, 信息安全

Abstract:

To solve the problem of coreference resolution in information security,a hybrid method was proposed.Based on the BiLSTM-attention-CRF model,the domain-dictionary matching mechanism was introduced and combined with the attention mechanism at the document level.As a new dictionary-based attention mechanism,the word features were calculated to solve the problem of weak recognition ability of rare entities and entities with long length when extracting candidates from text.And by summarizing the features of the domain texts,the candidates were coreferenced by rules and machine learning according to the part of speech to improve the accuracy.Through the experiments on security data set,the superiority of the method is proved from the aspects of coreference resolution and extraction of candidates from text .

Key words: coreference resolution, hybrid method, domain-dictionary matching mechanism, BiLSTM-attention-CRF, information security

中图分类号: 

No Suggested Reading articles found!