利用深度学习融合模型提升文本内容安全的研究

doi:10.11959/j.issn.1000-0801.2020145

电信科学 ›› 2020, Vol. 36 ›› Issue (5): 25-30.doi: 10.11959/j.issn.1000-0801.2020145

• 专题：网络安全的智能化和高对抗性发展 • 上一篇下一篇

利用深度学习融合模型提升文本内容安全的研究

汪少敏^1,²,王铮^1,²,任华^1,²

¹ 移动互联网系统与应用安全国家工程实验室，上海 201315
² 中国电信股份有限公司研究院，上海 200122

修回日期:2020-04-22 出版日期:2020-05-20 发布日期:2020-05-18
作者简介:汪少敏（1983- ），女，移动互联网系统与应用安全国家工程实验室、中国电信股份有限公司研究院高级工程师，主要研究方向为内容安全识别技术、人工智能技术和自然语言处理|王铮（1973- ），男，移动互联网系统与应用安全国家工程实验室、中国电信股份有限公司研究院高级工程师，主要研究方向为信息安全、人工智能技术、大数据架构和数据挖掘分析|任华（1977- ），女，移动互联网系统与应用安全国家工程实验室、中国电信股份有限公司研究院高级工程师，主要研究方向为内容信息安全、数据分析和人工智能技术

Research on fusion model based on deep learning for text content security enhancement

Shaomin WANG^1,²,Zheng WANG^1,²,Hua REN^1,²

¹ Mobile Internet System and Application Security National Engineering Laboratory,Shanghai 201315,China
² Research Institute of China Telecom Co.,Ltd.,Shanghai 200122,China

Revised:2020-04-22 Online:2020-05-20 Published:2020-05-18

摘要/Abstract

摘要：

互联网和移动互联网中的信息内容急速膨胀，导致其中充斥着违法违规和不良信息，影响互联网空间的内容安全。基于敏感词匹配的传统文本内容安全识别方法忽略上下文语义，导致误报率高、准确率低。在分析传统文本内容安全识别方法的基础上，提出了利用深度学习的融合识别模型以及模型融合算法流程。深入介绍了基于利用深度学习的融合识别模型的文本内容安全识别系统，并进行了实验验证。结果表明，所提模型可以有效解决传统识别方法缺乏语义理解造成误报率高的问题，提高了不良信息检测的准确性。

关键词: 内容安全, 违法违规和不良信息, 深度学习, 文本识别

Abstract:

The rapid expansion of information content on the internet and the mobile internet has resulted in violations of laws and regulations and bad information,which affects the content security of the internet space.Traditional text content security recognition methods based on matching of sensitive words ignore context semantics,resulting in high false positive rate and low accuracy.Based on the analysis of traditional text content security recognition methods,a fusion recognition model using deep learning and a model fusion algorithm process were proposed.Text content security recognition system based on the fusion recognition model using deep learning and experimental verification was introducted deeply.Results show that the proposed model can effectively solve the problem of high false positive rate caused by the lack of semantic understanding of traditional recognition methods,and improve the accuracy of the bad information detection.

Key words: content security, illegal information and unhealthy information, deep learning, text recognition

中图分类号:

TP393

汪少敏,王铮,任华. 利用深度学习融合模型提升文本内容安全的研究[J]. 电信科学, 2020, 36(5): 25-30.

Shaomin WANG,Zheng WANG,Hua REN. Research on fusion model based on deep learning for text content security enhancement[J]. Telecommunications Science, 2020, 36(5): 25-30.

图/表 5

参考文献 11

[1]	ZHAO Z Y , LI M , LI C ,et al. Dietary preferences and diabetic risk in China:a large-scale nationwide internet data based study[J]. 2019
[2]	高显俊, 黄儒乐 . 互联网数据在高校大数据平台中的应用研究[J]. 科技资讯, 2019,17(36): 12-13,15.
	GAO X J , HUANG R L . Research on the application of internet data in big data platform of universities[J]. Science ＆ Technology Information, 2019,17(36): 12-13,15.
[3]	刘丹 . 基于信息贫困理论的青少年信息行为浅析[J]. 时代金融, 2020(3): 138-140.
	LIU D . An analysis of youth information behavior based on information poverty theory[J]. Times Finance, 2020(3): 138-140.
[4]	BILLE P . Fast searching in packed strings[J]. Journal of Discrete Algorithms, 2010,9(1).
[5]	LECUN Y , BENGIO Y , HINTON G . Deep learning[J]. Nature, 2015,521(7553): 436-444.
[6]	邹一心, 范海平 . 爬虫技术在WAP网站内容监测中的应用[J]. 电信科学, 2010,26(Z1): 164-166.
	ZOU Y X , FAN H P . Application of reptile technology in content monitoring of WAP website[J]. Telecommunications Science, 2010,26(Z1): 164-166.
[7]	GATOS B . A binary-tree-based OCR technique for machine-printed characters[J]. Engineering Applications of Artificial Intelligence, 1997,10(4).
[8]	BALCáZAR J L , DíAZ R , GAVALDà R ,et al. The query complexity of learning DFA[J]. New Generation Computing, 1994,12(4): 337-358.
[9]	山世光, 阚美娜, 刘昕 ,等. 深度学习:多层神经网络的复兴与变革[J]. 科技导报, 2016,34(14): 60-70.
	SHAN S G , KAN M N , LIU X ,et al. Deep learning:the revival and transformation of multi layer neural networks[J]. Science ＆Technology Review, 2016,34(14): 60-70.
[10]	汪少敏, 杨迪, 任华 . 基于深度学习的文本分类系统关键技术研究与模型验证[J]. 电信科学, 2018,34(12): 117-124.
	WANG S M , YANG D , REN H . Key technology research and model validation of text classification system based on deep learning[J]. Telecommunications Science, 2018,34(12): 117-124.
[11]	蔡鑫, 娄京生 . 基于LSTM深度学习模型的中国电信官方微博用户情绪分析[J]. 电信科学, 2017,33(12): 136-141.
	CAI X , LOU J S . Sentiment analysis of telecom official micro-blog users based on LSTM deep learning model[J]. Telecommunications Science, 2017,33(12): 136-141.

[1]	卢敏, 胡娟, 张先超, 丁伟健, 乐光学. 基于用户多特征融合的个性化推荐模型[J]. 电信科学, 2023, 39(5): 101-115.
[2]	诸葛斌, 尹正虎, 斯文学, 颜蕾, 董黎刚, 蒋献. 基于学生知识追踪的多指标习题推荐算法[J]. 电信科学, 2022, 38(9): 129-143.
[3]	周杰, Esono Mikue Bernardo Esono, 王学英, 周惠婷, 罗宏. 基于SLM-PTS算法融合的NC-OFDM峰均比优化[J]. 电信科学, 2022, 38(7): 63-74.
[4]	李攀攀, 谢正霞, 乐光学, 刘鑫. 基于深度学习的无线通信接收方法研究进展与趋势[J]. 电信科学, 2022, 38(2): 1-17.
[5]	申情, 郭文宾, 楼俊钢, 余强国. 考虑多层次潜在特征的个性化推荐模型[J]. 电信科学, 2022, 38(2): 71-83.
[6]	陈志宏, 王明晓. 计算机视觉在智慧安防中的应用[J]. 电信科学, 2021, 37(8): 142-147.
[7]	唐博恒, 柴鑫刚. 基于云边协同的计算机视觉推理机制[J]. 电信科学, 2021, 37(5): 72-81.
[8]	孙姝君, 彭盛亮, 姚育东, 杨喜. 基于深度学习的调制识别综述[J]. 电信科学, 2021, 37(5): 82-90.
[9]	彭双, 王晓东, 彭宗举, 陈芬. 基于深度学习的快速QTMT划分[J]. 电信科学, 2021, 37(4): 73-81.
[10]	胡道允, 齐进, 陆钱春, 李锋, 房红强. 基于深度学习的流量工程算法研究与应用[J]. 电信科学, 2021, 37(2): 107-114.
[11]	张捷, 杨丽花, 王增浩, 呼博, 聂倩. 一种新型的基于深度学习的时变信道预测方法[J]. 电信科学, 2021, 37(1): 39-47.
[12]	侯慧芳,李雪芳,潘洁,丁志刚. 5G承载网数据采集和安全管控演进思路[J]. 电信科学, 2020, 36(9): 154-159.
[13]	李远宁,宁柏锋,董召杰. 电网机巡图像分析框架与深度学习方法[J]. 电信科学, 2020, 36(8): 167-174.
[14]	张婷婷,章坚武,郭春生,陈华华,周迪,王延松,徐爱华. 基于深度学习的图像目标检测算法综述[J]. 电信科学, 2020, 36(7): 92-106.
[15]	郭锐,冉凡春. 基于卷积神经网络的极化码译码算法[J]. 电信科学, 2020, 36(6): 119-124.

利用深度学习融合模型提升文本内容安全的研究

Research on fusion model based on deep learning for text content security enhancement

在线阅读

PDF下载

可视化

摘要/Abstract

引用本文

使用本文

图/表 5

参考文献 11

相关文章 15

Metrics

推荐阅读 0