网络与信息安全学报 ›› 2022, Vol. 8 ›› Issue (3): 29-40.doi: 10.11959/j.issn.2096-109x.2022035

• 专栏:多媒体内容安全 • 上一篇    下一篇

基于空域与频域关系建模的篡改文本图像检测

王裕鑫, 张博强, 谢洪涛, 张勇东   

  1. 中国科学技术大学,安徽 合肥 230026
  • 修回日期:2022-04-17 出版日期:2022-06-15 发布日期:2022-06-01
  • 作者简介:王裕鑫(1996− ),男,四川成都人,中国科学技术大学博士生,主要研究方向为场景文字检测与识别、人工智能
    张博强(2000− ),男,山西忻州人,中国科学技术大学硕士生,主要研究方向为场景文字检测与识别、人工智能
    谢洪涛(1983− ),男,河南信阳人,中国科学技术大学教授、博士生导师,主要研究方向为多媒体内容安全、医学影像智能分析
    张勇东(1973− ),男,山西运城人,中国科学技术大学教授、博士生导师,主要研究方向为多媒体内容分析、网络空间安全
  • 基金资助:
    国家自然科学基金(62121002);国家自然科学基金(62022076);国家自然科学基金(U1936210)

Tampered text detection via RGB and frequency relationship modeling

Yuxin WANG, Boqiang ZHANG, Hongtao XIE, Yongdong ZHANG   

  1. University of Science and Technology of China, Hefei 230026, China
  • Revised:2022-04-17 Online:2022-06-15 Published:2022-06-01
  • Supported by:
    The National Nature Science Foundation of China(62121002);The National Nature Science Foundation of China(62022076);The National Nature Science Foundation of China(U1936210)

摘要:

近年来,篡改文本图像在互联网的广泛传播为文本图像安全带来严重威胁。然而,相应的篡改文本检测(TTD,tampered text detection)方法却未得到充分的探索。TTD任务旨在定位图像中所有文本区域,同时根据纹理的真实性判断文本区域是否被篡改。与一般的文本检测任务不同,TTD 任务需要进一步感知真实文本和篡改文本分类的细粒度信息。TTD 任务有两个主要挑战:一方面,由于真实文本和篡改文本的纹理具有较高的相似性,仅在空域(RGB)进行纹理特征学习的篡改文本检测方法不能很好地区分两类文本;另一方面,由于检测真实文本和篡改文本的难度不同,检测模型无法平衡两类文本的学习过程,从而造成两类文本检测精度的不平衡问题。相较于空域特征,文本纹理在频域中的不连续性能够帮助网络鉴别文本实例的真伪,根据上述依据,提出基于空域和频域(RGB and frequency)关系建模的篡改文本检测方法。采用空域和频域特征提取器分别提取空域和频域特征,通过引入频域信息增强网络对篡改纹理的鉴别能力;使用全局空频域关系模块建模不同文本实例的纹理真实性关系,通过参考同幅图像中其他文本实例的空频域特征来辅助判断当前文本实例的真伪性,从而平衡真实和篡改文本检测难度,解决检测精度不平衡问题;提出一个票据篡改文本图像数据集(Tampered-SROIE)来验证上述篡改文本检测方法的有效性,该数据集包含986张图像(626 张训练图像和 360 张测试图像)。该方法在 Tampered-SROIE 上的真实和篡改文本检测 F 值分别达到95.97%和 96.80%,同时降低检测精度不平衡性 1.13%。该方法从网络结构与检测策略的角度为篡改文本检测任务提供了新的解决方案,同时Tampered-SROIE为以后的篡改文本检测方法提供了评估基准。

关键词: 篡改文本检测, 空频域关系建模, 篡改文本检测数据集, 评估基准

Abstract:

In recent years, the widespread dissemination of tampered text images on the Internet constitutes an important threat to the security of text images.However, the corresponding tampered text detection (TTD) methods have not been sufficiently explored.The TTD task aims to locate all text regions in an image while judging whether the text regions have been tampered with according to the authenticity of the texture.Thus, different from the general text detection task, TTD task further needs to perceive the fine-grained information for real-world and tampered text classification.TTD task has two main challenges.One the one hand, due to the high similarity in texture between real-world texts and tampered texts, TTD methods that only learn from RGB domain features have limited capability to distinguish these two-category texts well.On the other hand, as the different detecting difficulty exists in real-world texts and tampered texts, the network cannot well balance the learning process of the two-category texts, resulting in the imbalance detection performance between real-world and tampered texts.Compared with RGB domain features, the discontinuity of text texture in frequency domain can help the network to identify the authenticity of text instances.Accordingly, a new TTD method based on RGB and frequency information relationship modeling was proposed.The features in the RGB and frequency domains were extracted by independent feature extractors respectively.Thus, the identification ability of tampered texture can be enhanced by introducing frequency information during the texture perception.Then, a global RGB-frequency relationship module (GRM) was introduced to model the texture authenticity relationship between different text instances.GRM referred to the RGB-frequency features of other text instances in the same image to assist in judging the authenticity of the current text instance, which solved the problem of imbalanced detection performance.Furthermore, a new TTD dataset (Tampered-SROIE) was proposed to evaluate the effectiveness of proposed method, which contains 986 images (626 training images and 360 test images).By evaluating on the Tampered-SROIE, the proposed method obtains 95.97% and 96.80% in F-measure for real-world and tampered texts respectively and reduces the imbalanced detection accuracy by 1.13%.The proposed method will give new insights to the TTD community from the perspective of network structure and detection strategy.Tampered-SROIE also provides an evaluation benchmark for future TTD methods.

Key words: tampered text detection, RGB-frequency relationship modeling, tampered text detection dataset, evaluation benchmark

中图分类号: 

No Suggested Reading articles found!