Chinese Journal of Network and Information Security ›› 2022, Vol. 8 ›› Issue (2): 150-159.doi: 10.11959/j.issn.2096-109x.2022007

• Papers • Previous Articles     Next Articles

Evidence classification method of chat text based on DSR and BGRU model

Yu ZHANG1, Binglong LI1, Xuejuan LI2, Heyu ZHANG1   

  1. 1 Information Engineering University, Zhengzhou 450001, China
    2 Henan Polytechnic University, Jiaozuo 454003, China
  • Revised:2021-06-10 Online:2022-04-15 Published:2022-04-01
  • Supported by:
    The National Natural Science Foundation of China(60903220)

Abstract:

It is always unlikely to efficiently identify and extract chat text evidence related to criminal events, due to the complex semantics such as “slang” in the chat content and the huge amount of chat text data generated by social software such as instant messaging.Based on this motivation, a chat text evidence classification model (DSR-BGRU) based on the DSR (dynamic semantic representation) model and the BGRU (bidirectional gated recurrent unit) model was proposed.The chat text data was pre-processed to preserve the characteristics of the criminal field.Then a multi-layer chat text feature extraction and classification model using the Keras framework was proposed.With the text matrix composed of vector representation of words in the DSR model as the input vector, the input layer of the DSR model featured the chat text from the semantic level.Then the hidden layer of the BGRU model extracted the context characteristics of the text composed of the word vectors.The softmax classification layer recognized and extracted the chat text evidence.The experimental results show that the proposed DSR-BGRU can more accurately identify and extract chat records compared with other models and methods for text classification, and it can also effectively extract the criminal text information from the chat information with the accuracy rate 92.06% and the F1 score 91.00%.

Key words: text semantic representation, polysemy, text classification, digital forensics

CLC Number: 

No Suggested Reading articles found!