基于深度学习的警情记录关键信息自动抽取

doi:10.11959/j.issn.2096-0271.2022052

Abstract

Abstract:

With the emergence of intelligent policing, the channels of mass to call police are widened, unstructured police records increase immensely, and the difficulty of police entity recognition is magnified.For this pain point, BERT model was introduced to generate the word vector, the self-attention mechanism was integrated to capture the long-distance dependence between words, and the BERT-BiGRU-SelfAtt-CRF police entity recognition model was constructed.In order to verify the performance and generalization ability of this model, experiments were carried out on public datasets.And to prove the feasibility and efficiency of this model in the police field, experiments were run on the annotated police dataset.Ultimately, the results showed that BERT-BiGRU-SelfAtt-CRF model outperformed other models on the police dataset, with the precision of 82.45%, recall rate of 79.03%, and F1 value of 80.72%.It is concluded that this model can meet the requirements of actual police work, and it is feasible and effective in the field of police entity recognition.

Key words: deep learning, pretrained language model, self-attention mechanism, entity recognition in police records

CLC Number:

TP391.1

Yumeng CUI, Jingya WANG, Shangyi YAN, Zhizhong TAO. Automatic key information extraction of police records based on deep learning[J]. Big Data Research, 2022, 8(6): 127-142.

Figures/Tables 11

References 31

[1]	张晓艳, 王挺, 陈火旺 . 命名实体识别研究[J]. 计算机科学, 2005,32(4): 44-48.
	ZHANG X Y , WANG T , CHEN H W . Research on named entity recognition[J]. Computer Science, 2005,32(4): 44-48.
[2]	何玉洁, 杜方, 史英杰 ,等. 基于深度学习的命名实体识别研究综述[J]. 计算机工程与应用, 2021,57(11): 21-36.
	HE Y J , DU F , SHI Y J ,et al. Survey of named entity recognition based on deep learning[J]. Computer Engineering and Applications, 2021,57(11): 21-36.
[3]	王月, 王孟轩, 张胜 ,等. 基于BERT的警情文本命名实体识别[J]. 计算机应用, 2020,40(2): 535-540.
	WANG Y , WANG M X , ZHANG S ,et al. Alarm text named entity recognition based on BERT[J]. Journal of Computer Applications, 2020,40(2): 535-540.
[4]	ISOZAKI H , KAZAWA H . Efficient support vector classifiers for named entity recognition[C]// Proceedings of the 19th International Conference on Computational Linguistics. Morristown:Association for Computational Linguistics, 2002.
[5]	LIU K X , HU Q C , LIU J W ,et al. Named entity recognition in Chinese electronic medical records based on CRF[C]// Proceedings of 2017 14th Web Information Systems and Applications Conference. Piscataway:IEEE Press, 2017: 105-110.
[6]	HAN A L F , WONG D F , CHAO L S . Chinese named entity recognition with conditional random fields in the light of Chinese characteristics[C]// Proceedings of the Language Processing and Intelligent Information Systems.[S.l.:s.n.], 2013: 57-68.
[7]	MORWAL S . Named entity recognition using hidden Markov model (HMM)[J]. International Journal on Natural Language Computing, 2012,1(4): 15-23.
[8]	FU G H , LUKE K K . Chinese named entity recognition using lexicalized HMMs[J]. ACM SIGKDD Explorations Newsletter, 2005,7(1): 19-25.
[9]	BENDER O , OCH F J , NEY H . Maximum entropy models for named entity recognition[C]// Proceedings of the 7th Conference on Natural Language Learning at HLT-NAACL 2003. Morristown:Association for Computational Linguistics, 2003: 148-151.
[10]	CHIEU H L , NG H T . Named entity recognition:a maximum entropy approach using global information[C]// Proceedings of the 19th International Conference on Computational Linguistics. Morristown:Association for Computational Linguistics, 2002.
[11]	吴超, 王汉军 . 基于GRU的电力调度领域命名实体识别方法[J]. 计算机系统应用, 2020,29(8): 185-191.
	WU C , WANG H J . Named entity recognition in electric power dispatching field based on GRU[J]. Computer Systems＆ Applications, 2020,29(8): 185-191.
[12]	DONG C H , WU H J , ZHANG J J ,et al. Multichannel LSTM-CRF for named entity recognition in Chinese social media[C]// Proceedings of the Chinese Computational Linguistics and Natural Language Processing Based on Naturally Annotated Big Data.[S.l.:s.n.], 2017: 197-208.
[13]	WU F Z , LIU J X , WU C H ,et al. Neural Chinese named entity recognition via CNNLSTM-CRF and joint training with word segmentation[C]// Proceedings of World Wide Web Conference （WWW 2019）. New York:ACM Press, 2019: 3342-3348.
[14]	DONG C H , ZHANG J J , ZONG C Q ,et al. Character-based LSTM-CRF with radical-level features for Chinese named entity recognition[C]// Proceedings of the Natural Language Understanding and Intelligent Applications.[S.l.:s.n.], 2016: 239-250.
[15]	TANG B Z , WANG X L , YAN J ,et al. Entity recognition in Chinese clinical text using attention-based CNN-LSTMCRF[J]. BMC Medical Informatics and Decision Making, 2019,19(Suppl 3): 74.
[16]	HUANG Z H , XU W , YU K . Bidirectional LSTM-CRF models for sequence tagging[J]. arXiv preprint,2015,arXiv:1508.01991.
[17]	CHEN Y , ZHOU C J , LI T X ,et al. Named entity recognition from Chinese adverse drug event reports with lexical feature based BiLSTM-CRF and tri-training[J]. Journal of Biomedical Informatics, 2019,96:103252.
[18]	李一斌, 张欢欢 . 基于双向GRU-CRF的中文包装产品实体识别[J]. 华东理工大学学报(自然科学版), 2019,45(3): 486-490.
	LI Y B , ZHANG H H . Chinese packaging product entity recognition based on bidirectional GRU-CRF[J]. Journal of East China University of Science and Technology, 2019,45(3): 486-490.
[19]	WU G H , TANG G G , WANG Z R ,et al. An attention-based BiLSTM-CRF model for Chinese clinic named entity recognition[J]. IEEE Access, 2019,7: 113942-113949.
[20]	ZHONG Q , TANG Y . An attention-based BILSTM-CRF for Chinese named entity recognition[C]// Proceedings of 2020 IEEE 5th International Conference on Cloud Computing and Big Data Analytics. Piscataway:IEEE Press, 2020: 550-555.
[21]	MIKOLOV T , SUTSKEVER I , CHEN K ,et al. Distributed representations of words and phrases and their compositionality[C]// Proceedings of the Advances in Neural Information Processing Systems.[S.l.:s.n.], 2013: 3111-3119.
[22]	DEVLIN J , CHANG M.W , LEE K ,et al. Bert:pre-training of deep bidirectional transformers for language understanding[J]. arXiv preprint,2018,arXiv:1810.04805.
[23]	LI X Y , ZHANG H , ZHOU X H . Chinese clinical named entity recognition with variant neural structures based on BERT methods[J]. Journal of Biomedical Informatics, 2020,107:103422.
[24]	尹学振, 赵慧, 赵俊保 ,等. 多神经网络协作的军事领域命名实体识别[J]. 清华大学学报(自然科学版), 2020,60(8): 648-655.
	YIN X Z , ZHAO H , ZHAO J B ,et al. Multi-neural network collaboration for Chinese military named entity recognition[J]. Journal of Tsinghua University (Science and Technology), 2020,60(8): 648-655.
[25]	GU L , ZHANG W J , WANG Y ,et al. Named entity recognition in judicial field based on BERT-BiLSTM-CRF model[C]// Proceedings of 2020 International Workshop on Electronic Communication and Artificial Intelligence. Piscataway:IEEE Press, 2020: 170-174.
[26]	NIE Y Y , TIAN Y H , WAN X ,et al. Named entity recognition for social media texts with semantic augmentation[C]// Proceedings of 2020 Conference on Empirical Methods in Natural Language Processing. Stroudsburg:Association for Computational Linguistics, 2020: 1383-1391.
[27]	VASWANI A , SHAZEER N , PARMAR N ,et al. Attention is all you need[C]// Proceedings of the Advances in Neural Information Processing Systems.[S.l.:s.n.], 2017: 5998-6008.
[28]	CHO K , VAN MERRIENBOER B , GULCEHRE C ,et al. Learning phrase representations using RNN encoder–decoder for statistical machine translation[C]// Proceedings of 2014 Conference on Empirical Methods in Natural Language Processing. Stroudsburg:Association for Computational Linguistics, 2014: 1724-1734.
[29]	BAHDANAU D , CHO K , BENGIO Y . Neural machine translation by jointly learning to align and translate[J]. arXiv preprint,2018,arXiv:1409.0473.
[30]	LAFFERTY J , MCCALLUM A , PEREIRA F . Conditional random fields:probabilistic models for segmenting and labeling sequence data[C]// Proceedings of the 18th International Conference on Machine Learning.[S.l.:s.n.], 2001,3(2): 282-289.
[31]	GINA A L , . The third international Chinese language processing bakeoff:word segmentation and named entity recognition[C]// Proceedings of the 5th SIGHAN Workshop on Chinese Language Proceeding.[S.l.:s.n.], 2006: 548-554.

Metrics

Recommended 0

No Suggested Reading articles found!

参数类型	参数名称	参数值
BERT参数	layer_nums	4
	head_num	12
	hidden_size	768
BiGRU-CRF	BiGRU units	128
超参数	max_seq_length	100
	dropout_rate	0.4
	SelfAtt_head	12
BiGRU-CRF	Epochs	5
训练参数	batch_size	64

文本实例	标签	含义
北	B-LOC	案发地址实体的首部
京	I-LOC	案发地址实体的中间部分
的	O	非实体
李	B-PER	报警人姓名实体的首部
先	I-PER	报警人姓名实体的中间部分
生	I-PER	报警人姓名实体的中间部分
在	O	非实体
人	B-ORG	涉案机构实体的首部
民	I-ORG	涉案机构实体的中间部分
医	I-ORG	涉案机构实体的中间部分
院	I-ORG	涉案机构实体的中间部分

模型	训练周期	精确率	召回率	F1值	消耗时间/min
CNN-LSTM	40	72.87%	74.94%	73.98%	91
BiLSTM-CRF	40	95.50%	92.09%	93.76%	398
BiGRU-CRF	40	96.38%	93.22%	93.54%	245
BiGRU-SelfAtt-CRF	40	96.14%	93.37%	93.61%	252
BERT-CNN-LSTM	40	90.38%	93.98%	93.53%	309
BERT-BiLSTM-CRF	40	92.68%	90.31%	91.48%	443
BERT-BiGRU-CRF	40	91.11%	91.03%	91.07%	441
BERT-BiGRU-SelfAtt-CRF	40	91.62%	90.69%	91.13%	459

模型	训练周期	精确率	召回率	F1值	消耗时间/min
CNN-LSTM	50	30.95%	19.70%	24.07%	0.68
BiLSTM-CRF	50	68.92%	71.21%	69.79%	7.15
BiGRU-CRF	50	61.83%	68.18%	64.51%	2.62
BiGRU-SelfAtt-CRF	50	64.90%	69.27%	66.74%	3.27
BERT-CNN-LSTM	10	78.57%	65.67%	71.54%	10.28
BERT-BiLSTM-CRF	10	78.12%	74.63%	76.34%	17.22
BERT-BiGRU-CRF	10	79.69%	76.12%	77.86%	16.10
BERT-BiGRU-SelfAtt-CRF	10	82.45%	79.03%	80.72%	17.23

Automatic key information extraction of police records based on deep learning

RichHTML

PDF下载

Knowledge

Abstract

Cite this article

share this article

Figures/Tables 11

References 31

Related Articles 15

Metrics

Recommended 0

[1]	Yimin DENG, Xulong ZHANG, Shijing SI, Jianzong WANG, Jing XIAO. Human avatars synthesis technologies: a survey [J]. Big Data Research, 2023, 9(3): 114-139.
[2]	Yayun HE, Junqing PENG, Jianzong WANG, Jing XIAO. Rhythm dancer: 3D dance generation by keymotion transition graph and pose-interpolation network [J]. Big Data Research, 2023, 9(1): 23-37.
[3]	Zhitao ZHU, Shijing SI, Jianzong WANG, Jing XIAO. Survey on federated recommendation systems [J]. Big Data Research, 2022, 8(4): 105-132.
[4]	Jie WANG, Songyan ZHANG, Jiye LIANG. A semi-supervised deep learning algorithm combining consistency regularization and manifold regularization [J]. Big Data Research, 2022, 8(3): 103-114.
[5]	Kangting XU, Wei Song. A Chinese text sentiment analysis method combining language knowledge and deep learning [J]. Big Data Research, 2022, 8(3): 115-127.
[6]	Zhitao ZHAO, Lijun ZHAO, Zheng ZHANG, Ping TANG. Integration of remote sensing intelligent processing algorithm using container cloud technology [J]. Big Data Research, 2022, 8(2): 58-74.
[7]	Kai ZHANG, Yang CHE. Method of accelerating deep learning with optimized distributed cache in containers [J]. Big Data Research, 2021, 7(5): 150-163.
[8]	Jingxi WEN, Hufei YU, Jiang XIN, Yan TANG. Analysis of gender differences in the brain based on deep learning [J]. Big Data Research, 2021, 7(4): 130-140.
[9]	Xin PENG, Chi CHEN, Yun LIN. Context-based intelligent recommendation for code reuse [J]. Big Data Research, 2021, 7(1): 37-47.
[10]	Lihui WANG, Yongbin QIN. State of the art and future perspectives of the applications of deep learning in the medical image analysis [J]. Big Data Research, 2020, 6(6): 83-104.
[11]	Hufei YU, Jingxi WEN, Jiang XIN, Yan TANG. Study on domain adaptation of medical data based on generative adversarial network [J]. Big Data Research, 2020, 6(5): 45-54.
[12]	Ting SONG, Zhanwei CHEN, Haifeng YANG. Aspect sentiment analysis based on a hierarchical attention network [J]. Big Data Research, 2020, 6(5): 82-91.
[13]	Weiliang MA, Xuan PENG, Qian XIONG, Xuanhua SHI, Hai JIN. Memory management in deep learning:a survey [J]. Big Data Research, 2020, 6(4): 56-68.
[14]	Fan YU. Research on the next-generation deep learning framework [J]. Big Data Research, 2020, 6(4): 69-80.
[15]	Luchen LIU, Jianhao SHEN, Ming ZHANG, Zichang WANG, Haoran LI, Zequn LIU. Deep learning based patient representation learning framework of heterogeneous temporal events data [J]. Big Data Research, 2019, 5(1): 25-38.