大数据 ›› 2022, Vol. 8 ›› Issue (6): 127-142.doi: 10.11959/j.issn.2096-0271.2022052
崔雨萌, 王靖亚, 闫尚义, 陶知众
出版日期:
2022-11-15
发布日期:
2022-11-01
作者简介:
崔雨萌(1998- ),男,中国人民公安大学信息网络安全学院硕士生,主要研究方向为命名实体识别基金资助:
Yumeng CUI, Jingya WANG, Shangyi YAN, Zhizhong TAO
Online:
2022-11-15
Published:
2022-11-01
Supported by:
摘要:
随着智慧警务的兴起,民众报警渠道拓宽,非结构化警情激增,警情实体识别难度增大。针对这一业务痛点,引入BERT模型获取词向量,融合自注意力机制来捕获文字之间的长距离依赖关系,并构建BERTBiGRU-SelfAtt-CRF警情实体识别模型。为了验证模型的性能和泛化能力,在公开数据集上进行了实验。为了验证模型在警情领域的可行性和效率,在构建的警情数据集上进行了实验。实验结果表明,提出的模型在警情数据集上的精确率达到了82.45%,召回率达到了79.03%,F1值达到了80.72%,优于其他模型。可见,提出的模型可以满足实际公安工作需要,是可行、有效的。
中图分类号:
崔雨萌, 王靖亚, 闫尚义, 陶知众. 基于深度学习的警情记录关键信息自动抽取[J]. 大数据, 2022, 8(6): 127-142.
Yumeng CUI, Jingya WANG, Shangyi YAN, Zhizhong TAO. Automatic key information extraction of police records based on deep learning[J]. Big Data Research, 2022, 8(6): 127-142.
表3
在《人民日报》和MSRA数据集上的实验结果"
模型 | 训练周期 | 精确率 | 召回率 | F1值 | 消耗时间/min |
CNN-LSTM | 40 | 72.87% | 74.94% | 73.98% | 91 |
BiLSTM-CRF | 40 | 95.50% | 92.09% | 93.76% | 398 |
BiGRU-CRF | 40 | 96.38% | 93.22% | 93.54% | 245 |
BiGRU-SelfAtt-CRF | 40 | 96.14% | 93.37% | 93.61% | 252 |
BERT-CNN-LSTM | 40 | 90.38% | 93.98% | 93.53% | 309 |
BERT-BiLSTM-CRF | 40 | 92.68% | 90.31% | 91.48% | 443 |
BERT-BiGRU-CRF | 40 | 91.11% | 91.03% | 91.07% | 441 |
BERT-BiGRU-SelfAtt-CRF | 40 | 91.62% | 90.69% | 91.13% | 459 |
表4
在PRD-PSB数据集上的实验结果"
模型 | 训练周期 | 精确率 | 召回率 | F1值 | 消耗时间/min |
CNN-LSTM | 50 | 30.95% | 19.70% | 24.07% | 0.68 |
BiLSTM-CRF | 50 | 68.92% | 71.21% | 69.79% | 7.15 |
BiGRU-CRF | 50 | 61.83% | 68.18% | 64.51% | 2.62 |
BiGRU-SelfAtt-CRF | 50 | 64.90% | 69.27% | 66.74% | 3.27 |
BERT-CNN-LSTM | 10 | 78.57% | 65.67% | 71.54% | 10.28 |
BERT-BiLSTM-CRF | 10 | 78.12% | 74.63% | 76.34% | 17.22 |
BERT-BiGRU-CRF | 10 | 79.69% | 76.12% | 77.86% | 16.10 |
BERT-BiGRU-SelfAtt-CRF | 10 | 82.45% | 79.03% | 80.72% | 17.23 |
[1] | 张晓艳, 王挺, 陈火旺 . 命名实体识别研究[J]. 计算机科学, 2005,32(4): 44-48. |
ZHANG X Y , WANG T , CHEN H W . Research on named entity recognition[J]. Computer Science, 2005,32(4): 44-48. | |
[2] | 何玉洁, 杜方, 史英杰 ,等. 基于深度学习的命名实体识别研究综述[J]. 计算机工程与应用, 2021,57(11): 21-36. |
HE Y J , DU F , SHI Y J ,et al. Survey of named entity recognition based on deep learning[J]. Computer Engineering and Applications, 2021,57(11): 21-36. | |
[3] | 王月, 王孟轩, 张胜 ,等. 基于BERT的警情文本命名实体识别[J]. 计算机应用, 2020,40(2): 535-540. |
WANG Y , WANG M X , ZHANG S ,et al. Alarm text named entity recognition based on BERT[J]. Journal of Computer Applications, 2020,40(2): 535-540. | |
[4] | ISOZAKI H , KAZAWA H . Efficient support vector classifiers for named entity recognition[C]// Proceedings of the 19th International Conference on Computational Linguistics. Morristown:Association for Computational Linguistics, 2002. |
[5] | LIU K X , HU Q C , LIU J W ,et al. Named entity recognition in Chinese electronic medical records based on CRF[C]// Proceedings of 2017 14th Web Information Systems and Applications Conference. Piscataway:IEEE Press, 2017: 105-110. |
[6] | HAN A L F , WONG D F , CHAO L S . Chinese named entity recognition with conditional random fields in the light of Chinese characteristics[C]// Proceedings of the Language Processing and Intelligent Information Systems.[S.l.:s.n.], 2013: 57-68. |
[7] | MORWAL S . Named entity recognition using hidden Markov model (HMM)[J]. International Journal on Natural Language Computing, 2012,1(4): 15-23. |
[8] | FU G H , LUKE K K . Chinese named entity recognition using lexicalized HMMs[J]. ACM SIGKDD Explorations Newsletter, 2005,7(1): 19-25. |
[9] | BENDER O , OCH F J , NEY H . Maximum entropy models for named entity recognition[C]// Proceedings of the 7th Conference on Natural Language Learning at HLT-NAACL 2003. Morristown:Association for Computational Linguistics, 2003: 148-151. |
[10] | CHIEU H L , NG H T . Named entity recognition:a maximum entropy approach using global information[C]// Proceedings of the 19th International Conference on Computational Linguistics. Morristown:Association for Computational Linguistics, 2002. |
[11] | 吴超, 王汉军 . 基于GRU的电力调度领域命名实体识别方法[J]. 计算机系统应用, 2020,29(8): 185-191. |
WU C , WANG H J . Named entity recognition in electric power dispatching field based on GRU[J]. Computer Systems& Applications, 2020,29(8): 185-191. | |
[12] | DONG C H , WU H J , ZHANG J J ,et al. Multichannel LSTM-CRF for named entity recognition in Chinese social media[C]// Proceedings of the Chinese Computational Linguistics and Natural Language Processing Based on Naturally Annotated Big Data.[S.l.:s.n.], 2017: 197-208. |
[13] | WU F Z , LIU J X , WU C H ,et al. Neural Chinese named entity recognition via CNNLSTM-CRF and joint training with word segmentation[C]// Proceedings of World Wide Web Conference (WWW 2019). New York:ACM Press, 2019: 3342-3348. |
[14] | DONG C H , ZHANG J J , ZONG C Q ,et al. Character-based LSTM-CRF with radical-level features for Chinese named entity recognition[C]// Proceedings of the Natural Language Understanding and Intelligent Applications.[S.l.:s.n.], 2016: 239-250. |
[15] | TANG B Z , WANG X L , YAN J ,et al. Entity recognition in Chinese clinical text using attention-based CNN-LSTMCRF[J]. BMC Medical Informatics and Decision Making, 2019,19(Suppl 3): 74. |
[16] | HUANG Z H , XU W , YU K . Bidirectional LSTM-CRF models for sequence tagging[J]. arXiv preprint,2015,arXiv:1508.01991. |
[17] | CHEN Y , ZHOU C J , LI T X ,et al. Named entity recognition from Chinese adverse drug event reports with lexical feature based BiLSTM-CRF and tri-training[J]. Journal of Biomedical Informatics, 2019,96:103252. |
[18] | 李一斌, 张欢欢 . 基于双向GRU-CRF的中文包装产品实体识别[J]. 华东理工大学学报(自然科学版), 2019,45(3): 486-490. |
LI Y B , ZHANG H H . Chinese packaging product entity recognition based on bidirectional GRU-CRF[J]. Journal of East China University of Science and Technology, 2019,45(3): 486-490. | |
[19] | WU G H , TANG G G , WANG Z R ,et al. An attention-based BiLSTM-CRF model for Chinese clinic named entity recognition[J]. IEEE Access, 2019,7: 113942-113949. |
[20] | ZHONG Q , TANG Y . An attention-based BILSTM-CRF for Chinese named entity recognition[C]// Proceedings of 2020 IEEE 5th International Conference on Cloud Computing and Big Data Analytics. Piscataway:IEEE Press, 2020: 550-555. |
[21] | MIKOLOV T , SUTSKEVER I , CHEN K ,et al. Distributed representations of words and phrases and their compositionality[C]// Proceedings of the Advances in Neural Information Processing Systems.[S.l.:s.n.], 2013: 3111-3119. |
[22] | DEVLIN J , CHANG M.W , LEE K ,et al. Bert:pre-training of deep bidirectional transformers for language understanding[J]. arXiv preprint,2018,arXiv:1810.04805. |
[23] | LI X Y , ZHANG H , ZHOU X H . Chinese clinical named entity recognition with variant neural structures based on BERT methods[J]. Journal of Biomedical Informatics, 2020,107:103422. |
[24] | 尹学振, 赵慧, 赵俊保 ,等. 多神经网络协作的军事领域命名实体识别[J]. 清华大学学报(自然科学版), 2020,60(8): 648-655. |
YIN X Z , ZHAO H , ZHAO J B ,et al. Multi-neural network collaboration for Chinese military named entity recognition[J]. Journal of Tsinghua University (Science and Technology), 2020,60(8): 648-655. | |
[25] | GU L , ZHANG W J , WANG Y ,et al. Named entity recognition in judicial field based on BERT-BiLSTM-CRF model[C]// Proceedings of 2020 International Workshop on Electronic Communication and Artificial Intelligence. Piscataway:IEEE Press, 2020: 170-174. |
[26] | NIE Y Y , TIAN Y H , WAN X ,et al. Named entity recognition for social media texts with semantic augmentation[C]// Proceedings of 2020 Conference on Empirical Methods in Natural Language Processing. Stroudsburg:Association for Computational Linguistics, 2020: 1383-1391. |
[27] | VASWANI A , SHAZEER N , PARMAR N ,et al. Attention is all you need[C]// Proceedings of the Advances in Neural Information Processing Systems.[S.l.:s.n.], 2017: 5998-6008. |
[28] | CHO K , VAN MERRIENBOER B , GULCEHRE C ,et al. Learning phrase representations using RNN encoder–decoder for statistical machine translation[C]// Proceedings of 2014 Conference on Empirical Methods in Natural Language Processing. Stroudsburg:Association for Computational Linguistics, 2014: 1724-1734. |
[29] | BAHDANAU D , CHO K , BENGIO Y . Neural machine translation by jointly learning to align and translate[J]. arXiv preprint,2018,arXiv:1409.0473. |
[30] | LAFFERTY J , MCCALLUM A , PEREIRA F . Conditional random fields:probabilistic models for segmenting and labeling sequence data[C]// Proceedings of the 18th International Conference on Machine Learning.[S.l.:s.n.], 2001,3(2): 282-289. |
[31] | GINA A L , . The third international Chinese language processing bakeoff:word segmentation and named entity recognition[C]// Proceedings of the 5th SIGHAN Workshop on Chinese Language Proceeding.[S.l.:s.n.], 2006: 548-554. |
[1] | 邓钇敏, 张旭龙, 司世景, 王健宗, 肖京. 虚拟人形象合成技术综述[J]. 大数据, 2023, 9(3): 114-139. |
[2] | 贺亚运, 彭俊清, 王健宗, 肖京. 节奏舞者:基于关键动作转换图和有条件姿态插值网络的3D舞蹈生成方法研究[J]. 大数据, 2023, 9(1): 23-37. |
[3] | 关海山, 郑玉龙, 魏笔凡, 张泽民, 岳浩, 师斌, 董博. 税收优惠政策关键要素抽取与可视化分析[J]. 大数据, 2022, 8(5): 106-123. |
[4] | 朱智韬, 司世景, 王健宗, 肖京. 联邦推荐系统综述[J]. 大数据, 2022, 8(4): 105-132. |
[5] | 王杰, 张松岩, 梁吉业. 融合一致性正则与流形正则的半监督深度学习算法[J]. 大数据, 2022, 8(3): 103-114. |
[6] | 徐康庭, 宋威. 结合语言知识和深度学习的中文文本情感分析方法[J]. 大数据, 2022, 8(3): 115-127. |
[7] | 赵智韬, 赵理君, 张正, 唐娉. 基于容器云技术的典型遥感智能解译算法集成[J]. 大数据, 2022, 8(2): 58-74. |
[8] | 张凯, 车漾. 基于分布式缓存加速容器化深度学习的优化方法[J]. 大数据, 2021, 7(5): 150-163. |
[9] | 温景熙, 于胡飞, 辛江, 唐艳. 基于深度学习的大脑性别差异分析[J]. 大数据, 2021, 7(4): 130-140. |
[10] | 彭鑫, 陈驰, 林云. 基于上下文的智能化代码复用推荐[J]. 大数据, 2021, 7(1): 37-47. |
[11] | 王丽会, 秦永彬. 深度学习在医学影像中的研究进展及发展趋势[J]. 大数据, 2020, 6(6): 83-104. |
[12] | 于胡飞, 温景熙, 辛江, 唐艳. 基于生成对抗网络的医学数据域适应研究[J]. 大数据, 2020, 6(5): 45-54. |
[13] | 宋婷, 陈战伟, 杨海峰. 基于分层注意力网络的方面情感分析[J]. 大数据, 2020, 6(5): 82-91. |
[14] | 马玮良, 彭轩, 熊倩, 石宣化, 金海. 深度学习中的内存管理问题研究综述[J]. 大数据, 2020, 6(4): 56-68. |
[15] | 于璠. 新一代深度学习框架研究[J]. 大数据, 2020, 6(4): 69-80. |
阅读次数 | ||||||
全文 |
|
|||||
摘要 |
|
|||||
|