电信科学 ›› 2020, Vol. 36 ›› Issue (11): 121-126.doi: 10.11959/j.issn.1000-0801.2020303

• 专栏:信息安全 • 上一篇    下一篇

基于Bert模型的互联网不良信息检测

蔡鑫   

  1. 中国电信股份有限公司研究院,上海 200122
  • 修回日期:2020-11-01 出版日期:2020-11-20 发布日期:2020-12-09
  • 作者简介:蔡鑫(1975- ),男,中国电信股份有限公司上海研究院高级工程师,主要研究方向为数据分析挖掘、人工智能、数据规划和信息安全

Internet bad information detection based on Bert model

Xin CAI   

  1. Research Institute of China Telecom Co.,Ltd.,Shanghai 200122,China
  • Revised:2020-11-01 Online:2020-11-20 Published:2020-12-09

摘要:

针对互联网不良信息检测这一业务场景,探讨了基于网站文本内容进行检测的方法。回顾了经典的文本分析技术,重点介绍了Bert模型的关键技术特点及其两种不同用法。详细描述了利用其中的特征提取方法,进行网站不良信息检测的具体实施方案,并且与传统的TF-IDF模型以及word2vec+LSTM模型进行了对比验证,证实了这一方法的有效性。

关键词: 不良信息, Bert模型, 文本分析, 特征提取

Abstract:

In view of the business scenario of bad information detection on the internet,the method of detection based on the text content of the website was discussed .Classical text analysis techniques were reviewed.The key technical features and two different usages of Bert model were introduced.The specific implementation scheme of using the feature extraction method to detect website bad information was described in detail,and was compared with the traditional TF-IDF model and word2vec+LSTM model.The validity of this method is verified.

Key words: bad information, Bert model, text analysis, feature extraction

中图分类号: 

No Suggested Reading articles found!