大数据

• •    

基于深度学习的警情记录关键信息自动抽取

崔雨萌,王靖亚,闫尚义,陶知众   

  1. 中国人民公安大学信息网络安全学院,北京 100038

  • 作者简介:崔雨萌(1998- ),男,硕士生,中国人民公安大学,主要研究方向为命名实体识别、文本分类。 王靖亚(1966- ),女,硕士,中国人民公安大学教授,主要研究方向为自然语言处理、样本对抗。 闫尚义(1998- ),男,硕士生,中国人民公安大学,主要研究方向为自然语言处理、文本分类 陶知众(1997- ),男,硕士生,中国人民公安大学,主要研究方向为人工智能、图像风格转换。

Automatic Key Information Extraction of Police Records Based on Deep Learning

CUI Yumeng, WANG Jingya, YAN Shangyi, TAO Zhizhong   

  1. College of Information Network Security, People’s Public Security University of China, Beijing 100038, China

摘要: 随着智慧警务的兴起,民众报警渠道拓宽,非结构化警情激增,警情实体识别难度增大。针对这一业务痛点,引入了BERT模型获取词向量,融合了自注意力机制来捕获文字之间的长距离依赖关系,并构建了BERT-BiGRU-SelfAtt-CRF警情实体识别模型。为验证模型的性能和泛化能力,在公开数据集上进行了实验。为验证模型在警情领域的效率和可行性,在构建的警情数据集上进行了实验。最终,结果表明BERT-BiGRU-SelfAtt-CRF模型在警情数据集上的精确度达到了82.45%,F1分数达到了80.72%,表现优于其他模型。可见,此模型可以满足实际公安工作需要,是有效可行的。

关键词: 深度学习, 预训练语言模型, 自注意力机制, 警情实体识别

Abstract:

As the emergence of intelligent policing, the channels of mass to call police are widened, unstructured police records increase immensely, and the difficulty of police entity recognition is Public Security Behavioral Science Laboratory, Peoples Public Security University of China (2020sys08).magnified. For this pain point, BERT model was introduced to generate the word vector, the Self-attention mechanism was integrated to capture the long-distance dependence between words, and the BERT-BiGRU-SelfAtt-CRF police entity recognition model was constructed. In order to verify the performance and generalization ability of the model, experiments were carried out on public datasets. And to prove the feasibility and efficiency of the model in the police field, experiments were run on the annotated police dataset. Ultimately, the results showed that BERT-BiGRU-SelfAtt-CRF model outperformed other models on the police dataset, with the precision of 82.45% and F1 score of 80.72%. It is concluded that this model could meet the requirements of actual police work, and it is effective and feasible in the field of police entity recognition.

Key words: deep learning, pretrained language model, self-attention mechanism, entity recognition in police records

No Suggested Reading articles found!