电信科学 ›› 2020, Vol. 36 ›› Issue (1): 119-126.doi: 10.11959/j.issn.1000-0801.2020010

• 研究与开发 • 上一篇    下一篇

基于机器学习的多源威胁情报质量评价方法

刘汉生1,2,唐洪玉1,薄明霞1,牛剑锋1,李天博1,李玲晓1   

  1. 1 中国电信股份有限公司上海研究院,上海 200122
    2 中国电信股份有限公司北京研究院,北京 102209
  • 修回日期:2020-01-06 出版日期:2020-01-20 发布日期:2020-02-13
  • 作者简介:刘汉生(1993- ),男,中国电信股份有限公司北京研究院新兴信息技术研究所网络AI研究中心工程师,主要研究方向为人工智能、威胁情报、网络智能化运维等|唐洪玉(1977- ),男,中国电信股份有限公司上海研究院云安全研究所副所长,主要研究方向为云安全、态势感知、威胁情报|薄明霞(1978- ),女,博士,中国电信股份有限公司上海研究院云安全研究所高级工程师,主要研究方向为威胁情报、软件定义安全等|牛剑锋(1993- ),男,中国电信股份有限公司上海研究院云安全研究所开发总监,主要研究方向为云安全、威胁情报等|李天博(1988- ),男,中国电信股份有限公司上海研究院云安全研究所产品运维经理,主要研究方向为Web安全|李玲晓(1991- ),女,中国电信股份有限公司上海研究院云安全研究所工程师,主要研究方向为网站安全

A multi-source threat intelligence confidence value evaluation method based on machine learning

Hansheng LIU1,2,Hongyu TANG1,Mingxia BO1,Jianfeng NIU1,Tianbo LI1,Lingxiao LI1   

  1. 1 Shanghai Research Institute of China Telecom Co.,Ltd.,Shanghai 200122,China
    2 Beijing Research Institute of China Telecom Co.,Ltd.,Beijing 102209,China
  • Revised:2020-01-06 Online:2020-01-20 Published:2020-02-13

摘要:

在多源威胁情报收集过程中,由于存在数据价值密度低、情报重复度高、失效时间快等问题,情报中心难以对海量情报数据做出科学决策。针对上述问题,提出一种基于机器学习的多源威胁情报质量评价方法。首先基于标准情报格式,设计了一套多源情报数据标准化流程;其次,针对情报数据的特点,分别从情报来源、情报内容、活跃周期、黑名单库匹配程度4个维度提取特征作为评估情报质量的依据;然后针对提取的特征编码,设计了一套基于深度神经网络算法和 Softmax 分类器的情报质量评价模型,并利用反向误差传播算法最小化重构误差;最后根据2 000条开源已标注样本数据,利用K折交叉验证法对模型进行验证,得到了平均91.37%的宏查准率和84.89%的宏查全率,为多源威胁情报质量评估提供借鉴和参考。

关键词: 信息安全, 威胁情报, 质量评价, 深度神经网络

Abstract:

During the collection process of multi-source threat intelligence,it is very hard for the intelligence center to make a scientific decision to massive intelligence because the data value density is low,the intelligence repeatabil-ity is high,and the ineffective time is very short,etc.Based on those problems,a new multi-source threat intelligence confidence value evaluation method was put forward based on machine learning.First of all,according to the STIX intelligence standard format,a multi-source intelligence data standardization process was designed.Secondly,ac-cording to the characteristic of data,14 characteristics were extracted from four dimensions of publishing time,source,intelligence content and blacklist matching degree to be the basis of determining the intelligence reliability.After getting the feature encoding,an intelligence confidence value evaluation model was designed based on deep neural network algorithm and Softmax classifier.Backward propagation algorithm was also used to minimize recon-struction error.Last but not least,according to the 2 000 open source marked sample data,k-ford cross-validation method was used to evaluate the model and get an average of 91.37% macro-P rate and 84.89% macro-R rate.It was a good reference for multi-source threat intelligence confidence evaluation.

Key words: information safety, threat intelligence, confidence evaluation, deep neural network

中图分类号: 

No Suggested Reading articles found!