大数据 ›› 2022, Vol. 8 ›› Issue (6): 127-142.doi: 10.11959/j.issn.2096-0271.2022052

• 研究 • 上一篇    下一篇

基于深度学习的警情记录关键信息自动抽取

崔雨萌, 王靖亚, 闫尚义, 陶知众   

  1. 中国人民公安大学信息网络安全学院,北京 100038
  • 出版日期:2022-11-15 发布日期:2022-11-01
  • 作者简介:崔雨萌(1998- ),男,中国人民公安大学信息网络安全学院硕士生,主要研究方向为命名实体识别
    王靖亚(1966- ),女,中国人民公安大学信息网络安全学院教授,主要研究方向为自然语言处理、样本对抗
    闫尚义(1998- ),男,中国人民公安大学信息网络安全学院硕士生,主要研究方向为自然语言处理、文本分类
    陶知众(1997- ),男,中国人民公安大学信息网络安全学院硕士生,主要研究方向为人工智能、图像风格转换
  • 基金资助:
    国家社会科学基金资助项目(20AZD114)

Automatic key information extraction of police records based on deep learning

Yumeng CUI, Jingya WANG, Shangyi YAN, Zhizhong TAO   

  1. College of Information Network Security, People’s Public Security University of China, Beijing 100038, China
  • Online:2022-11-15 Published:2022-11-01
  • Supported by:
    The National Social Science Foundation of China(20AZD114)

摘要:

随着智慧警务的兴起,民众报警渠道拓宽,非结构化警情激增,警情实体识别难度增大。针对这一业务痛点,引入BERT模型获取词向量,融合自注意力机制来捕获文字之间的长距离依赖关系,并构建BERTBiGRU-SelfAtt-CRF警情实体识别模型。为了验证模型的性能和泛化能力,在公开数据集上进行了实验。为了验证模型在警情领域的可行性和效率,在构建的警情数据集上进行了实验。实验结果表明,提出的模型在警情数据集上的精确率达到了82.45%,召回率达到了79.03%,F1值达到了80.72%,优于其他模型。可见,提出的模型可以满足实际公安工作需要,是可行、有效的。

关键词: 深度学习, 预训练语言模型, 自注意力机制, 警情实体识别

Abstract:

With the emergence of intelligent policing, the channels of mass to call police are widened, unstructured police records increase immensely, and the difficulty of police entity recognition is magnified.For this pain point, BERT model was introduced to generate the word vector, the self-attention mechanism was integrated to capture the long-distance dependence between words, and the BERT-BiGRU-SelfAtt-CRF police entity recognition model was constructed.In order to verify the performance and generalization ability of this model, experiments were carried out on public datasets.And to prove the feasibility and efficiency of this model in the police field, experiments were run on the annotated police dataset.Ultimately, the results showed that BERT-BiGRU-SelfAtt-CRF model outperformed other models on the police dataset, with the precision of 82.45%, recall rate of 79.03%, and F1 value of 80.72%.It is concluded that this model can meet the requirements of actual police work, and it is feasible and effective in the field of police entity recognition.

Key words: deep learning, pretrained language model, self-attention mechanism, entity recognition in police records

中图分类号: 

No Suggested Reading articles found!