Journal on Communications ›› 2022, Vol. 43 ›› Issue (7): 85-92.doi: 10.11959/j.issn.1000-436x.2022132

• Papers • Previous Articles     Next Articles

Fusion of Focal Loss’s cyber threat intelligence entity extraction

Yuanbo GUO1, Yongfei LI1, Qingli CHEN1, Chen FANG1, Yangyang HU2   

  1. 1 Department of Cryptogram Engineering, Information Engineering University, Zhengzhou 450001, China
    2 University of California, Riverside, Riverside CA92521, USA
  • Revised:2022-06-02 Online:2022-07-25 Published:2022-06-01
  • Supported by:
    The National Natural Science Foundation of China(61501515);The National Natural Science Foundation of China(61601515)

Abstract:

Cyber threat intelligence contains a wealth of knowledge of threat behavior.Timely analysis and process of threat intelligence can promote the transformation of defense from passive to active.Nowadays, most threat intelligence that exists in the form of natural language texts contains a large amount of unstructured data, which needs to be converted into structured data for subsequent processing using entity extraction methods.However, since threat intelligence contains numerous terminology such as vulnerability names, malware and APT organizations, and the distribution of entities are extremely unbalanced, the performance of extraction methods in general field are severely limited when applied to threat intelligence.Therefore, an entity extraction model integrated with Focal Loss was proposed, which improved the cross-entropy loss function and balanced sample distribution by introducing balance factor and modulation coefficient.In addition, for the problem that threat intelligence had a complex structure and a wide range of sources, and contained a large number of professional words, token and character features were added to the model, which effectively improved OOV (out of vocabulary) problem in threat intelligence.Experiment results show that compared with existing mainstream model BiLSTM and BiLSTM-CRF, the F1 scores of the proposed model is increased by 7.07% and 4.79% respectively, which verifies the effectiveness of introducing Focal Loss and character features.

Key words: cyber security, threat intelligence, entity extraction, label imbalance

CLC Number: 

No Suggested Reading articles found!