基于空域与频域关系建模的篡改文本图像检测

doi:10.11959/j.issn.2096-109x.2022035

摘要/Abstract

摘要：

近年来，篡改文本图像在互联网的广泛传播为文本图像安全带来严重威胁。然而，相应的篡改文本检测（TTD，tampered text detection）方法却未得到充分的探索。TTD任务旨在定位图像中所有文本区域，同时根据纹理的真实性判断文本区域是否被篡改。与一般的文本检测任务不同，TTD 任务需要进一步感知真实文本和篡改文本分类的细粒度信息。TTD 任务有两个主要挑战：一方面，由于真实文本和篡改文本的纹理具有较高的相似性，仅在空域（RGB）进行纹理特征学习的篡改文本检测方法不能很好地区分两类文本；另一方面，由于检测真实文本和篡改文本的难度不同，检测模型无法平衡两类文本的学习过程，从而造成两类文本检测精度的不平衡问题。相较于空域特征，文本纹理在频域中的不连续性能够帮助网络鉴别文本实例的真伪，根据上述依据，提出基于空域和频域（RGB and frequency）关系建模的篡改文本检测方法。采用空域和频域特征提取器分别提取空域和频域特征，通过引入频域信息增强网络对篡改纹理的鉴别能力；使用全局空频域关系模块建模不同文本实例的纹理真实性关系，通过参考同幅图像中其他文本实例的空频域特征来辅助判断当前文本实例的真伪性，从而平衡真实和篡改文本检测难度，解决检测精度不平衡问题；提出一个票据篡改文本图像数据集（Tampered-SROIE）来验证上述篡改文本检测方法的有效性，该数据集包含986张图像（626 张训练图像和 360 张测试图像）。该方法在 Tampered-SROIE 上的真实和篡改文本检测 F 值分别达到95.97%和 96.80%，同时降低检测精度不平衡性 1.13%。该方法从网络结构与检测策略的角度为篡改文本检测任务提供了新的解决方案，同时Tampered-SROIE为以后的篡改文本检测方法提供了评估基准。

关键词: 篡改文本检测, 空频域关系建模, 篡改文本检测数据集, 评估基准

Abstract:

In recent years, the widespread dissemination of tampered text images on the Internet constitutes an important threat to the security of text images.However, the corresponding tampered text detection (TTD) methods have not been sufficiently explored.The TTD task aims to locate all text regions in an image while judging whether the text regions have been tampered with according to the authenticity of the texture.Thus, different from the general text detection task, TTD task further needs to perceive the fine-grained information for real-world and tampered text classification.TTD task has two main challenges.One the one hand, due to the high similarity in texture between real-world texts and tampered texts, TTD methods that only learn from RGB domain features have limited capability to distinguish these two-category texts well.On the other hand, as the different detecting difficulty exists in real-world texts and tampered texts, the network cannot well balance the learning process of the two-category texts, resulting in the imbalance detection performance between real-world and tampered texts.Compared with RGB domain features, the discontinuity of text texture in frequency domain can help the network to identify the authenticity of text instances.Accordingly, a new TTD method based on RGB and frequency information relationship modeling was proposed.The features in the RGB and frequency domains were extracted by independent feature extractors respectively.Thus, the identification ability of tampered texture can be enhanced by introducing frequency information during the texture perception.Then, a global RGB-frequency relationship module (GRM) was introduced to model the texture authenticity relationship between different text instances.GRM referred to the RGB-frequency features of other text instances in the same image to assist in judging the authenticity of the current text instance, which solved the problem of imbalanced detection performance.Furthermore, a new TTD dataset (Tampered-SROIE) was proposed to evaluate the effectiveness of proposed method, which contains 986 images (626 training images and 360 test images).By evaluating on the Tampered-SROIE, the proposed method obtains 95.97% and 96.80% in F-measure for real-world and tampered texts respectively and reduces the imbalanced detection accuracy by 1.13%.The proposed method will give new insights to the TTD community from the perspective of network structure and detection strategy.Tampered-SROIE also provides an evaluation benchmark for future TTD methods.

Key words: tampered text detection, RGB-frequency relationship modeling, tampered text detection dataset, evaluation benchmark

中图分类号:

TP393

王裕鑫, 张博强, 谢洪涛, 张勇东. 基于空域与频域关系建模的篡改文本图像检测[J]. 网络与信息安全学报, 2022, 8(3): 29-40.

Yuxin WANG, Boqiang ZHANG, Hongtao XIE, Yongdong ZHANG. Tampered text detection via RGB and frequency relationship modeling[J]. Chinese Journal of Network and Information Security, 2022, 8(3): 29-40.

图/表 9

图1

图2

图3

图4

图5

图6

表1

消融实验结果（GRM表示全局空频域关系模块） Table 1 The result of ablation study (GRM is the global RGB-frequency relationship module)"

方法		真实文本			篡改文本			\|Gap-F\|
GRM	频域信息	召回率	准确率	F值	召回率	准确率	F值	\|Gap-F\|
—	—	92.00%	96.96%	94.44%	95.51%	97.32%	96.40%	1.96%
√	—	94.58%	97.04%	95.79%	95.59%	97.44%	96.51%	$0 . 72 %$
√	√	$94 . 71 %$	$97 . 26 %$	$95 . 97 %$	$96 . 07 %$	$97 . 55 %$	$96 . 80 %$	0.83%

表1

表2

Tampered-SROIE实验效果Table 2 The experiment result on Tampered-SROIE"

方法		真实文本			篡改文本		\|Gap-F\|
方法	召回率	准确率	F值	召回率	准确率	F值	\|Gap-F\|
EAST^[32]	91.14%	84.62%	87.76%	91.91%	89.61%	90.75%	2.99%
ATRR^[33]	$96 . 70 %$	95.24%	95.96%	94.71%	92.49%	93.59%	2.37%
本文方法	94.71%	$97 . 26 %$	$95 . 97 %$	$96 . 07 %$	$97 . 55 %$	$96 . 80 %$	$0 . 83 %$

表2

图7

参考文献 33

[1]	WANG Y , XIE H , ZHA Z J ,et al. Contournet:taking a further step toward accurate arbitrary-shaped scene text detection[C]// CVPR. 2020: 11753-11762.
[2]	WANG Y , XIE H , FANG S ,et al. From two to one:a new scene text recognizer with visual language modeling network[C]// ICCV. 2021: 14194-14203.
[3]	FANG S , XIE H , WANG Y ,et al. Read like humans:autonomous,bidirectional and iterative language modeling for scene text recognition[C]// Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 2021: 7098-7107.
[4]	WU L , ZHANG C , LIU J ,et al. Editing text in the wild[C]// ACM MM. 2019: 1500-1508.
[5]	YANG Q , HUANG J , LIN W . Swaptext:image based texts transfer in scenes[C]// Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 2020: 14700-14709.
[6]	ROY P , BHATTACHARYA S , GHOSH S ,et al. Stefann:scene text editor using font adaptive neural network[C]// Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 2020: 13228-13237.
[7]	王志红, 黄惠芬, 常玉红 . 基于SVD分解和Hadamard变换的图像半脆弱水印技术[J]. 网络与信息安全学报, 2017,3(5): 26-31.
	WANG Z H , HUANG H F , CHANG Y H . Semi-fragile watermarking technology based on SVD decomposition and Hadamard transform[J]. Chinese Journal of Network and Information Security, 2017,3(5): 26-31.
[8]	常玉红, 黄惠芬, 王志红 . 检测图像篡改的脆弱水印技术[J]. 网络与信息安全学报, 2017,3(7): 47-52.
	CHANG Y H , HUANG H F , WANG Z H . Fragile watermarking technique for detecting image[J]. Chinese Journal of Network and Information Security, 2017,3(7): 47-52.
[9]	FRIDRICH J , SOUKAL D , LUKAS J . Detection of copymove forgery in digital images[C]// Proceedings of Digital Forensic Research Workshop. 2003: 55-61.
[10]	RYU S J , KIRCHNER ,et al. Rotation invariant localization of duplicated image regions based on Zernike moments[J]. IEEE Transactions on Information Forensics and Security, 2013,8(8): 1355-1370.
[11]	HUANG H , GUO W , ZHANG Y . Detection of copy-move forgery in digital images using SIFT algorithm[C]// IEEE Pacific-Asia Workshop on Computational Intelligence and Industrial Application. 2009: 272-276.
[12]	FARID H . Exposing digital forgeries from JPEG ghosts[J]. IEEE Transactions on Information Forensics and Security, 2009,4(1): 154-160.
[13]	MIKKILINENI A K , CHIANG P J , ALI G N ,et al. Printer identification based on texture features[C]// NIP ＆ Digital Fabrication Conference.Society for Imaging Science and Technology. 2004: 306-311.
[14]	LAMPERT C H , MEI L , BREUEL T M . Printing technique classification for document counterfeit detection[C]// 2006 International Conference on Computational Intelligence and Security. 2006: 639-644.
[15]	SCHULZE C , SCHREYER M , STAHL A ,et al. Using DCT features for printing technique and copy detection[C]// IFIP International Conference on Digital Forensics. 2009: 95-106.
[16]	AHMED A G H , SHAFAIT F . Forgery detection based on intrinsic document contents[C]// 2014 11th IAPR International Workshop on Document Analysis Systems. 2014: 252-256.
[17]	BATAINEH B , ABDULLAH S N H S , OMAR K . A statistical global feature extraction method for optical font recognition[C]// Asian Conference on Intelligent Information and Database Systems. 2011: 257-267.
[18]	ZRAMDINI A , INGOLD R . Optical font recognition using typographical features[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 1998,20(8): 877-882.
[19]	BERTRAND R , TERRADES O R , GOMEZ-KR?MER P ,et al. A conditional random field model for font forgery detection[C]// 2015 13th Inter-national Conference on Document Analysis and Recognition (ICDAR). 2015: 576-580.
[20]	VAN-BEUSEKOM J , SHAFAIT F , BREUEL T M . Text-line examination for document forgery detection[J]. International Journal on Document Analysis and Recognition (IJDAR), 2013,16(2): 189-207.
[21]	CRUZ F , SIDERE N , COUSTATY M ,et al. Local binary patterns for document forgery detection[C]// 2017 14th IAPR International Conference on Document Analysis and Recognition (ICDAR). 2017: 1223-1228.
[22]	ABRAMOVA S . Detecting copy–move forgeries in scanned text documents[J]. Electronic Imaging, 2016,2016(8): 1-9.
[23]	JAMES H , GUPTA O , RAVIV D . Learning document graphs with attention for image manipulation detection[R].
[24]	DANG H , LIU F , STEHOUWER J ,et al. On the detection of digital face manipulation[C]// CVPR. 2020: 5780-5789.
[25]	QI H , GUO Q , XU J F ,et al. DeepRhythm:exposing deepfakes with attentional visual heartbeat rhythms[C]// ACM Multimedia. 2020: 4318-4327.
[26]	IACOPO MASI , ADITYA KILLEKAR , ROYSTON MARIAN MASCARENHAS ,et al. Two-branch recurrent network for isolating deepfakes in videos[C]// ECCV. 2020: 667-684.
[27]	QIAN Y Y , YIN G J , SHENG L ,et al. Thinking in frequency:face forgery detection by mining fre-quency-aware clues[C]// ECCV. 2020: 86-103.
[28]	LI J M , XIE H T , LI J H ,et al. Frequency-aware discriminative feature learning supervised by single-center loss for face forgery detection[C]// CVPR. 2021: 6458-6467.
[29]	HE K , GKIOXARI G , DOLLAR P . Mask r-CNN[C]// Proceedings of the IEEE International Conference on Computer Vision. 2017: 2961-2969.
[30]	HUANG Z , CHEN K , HE J ,et al. Icdar2019 competition on scanned receipt ocr and information extraction[C]// 2019 International Conference on Document Analysis and Recognition (ICDAR). 2019: 1516-1520.
[31]	WU L , ZHANG C , LIU J ,et al. Editing text in the wild[C]// ACM MM. 2019: 1500-1508.
[32]	ZHOU X , YAO C , WEN H ,et al. East:an efficient and accurate scene text detector[C]// Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 2017: 5551-5560.
[33]	WANG X , JIANG Y , LUO Z ,et al. Arbitrary shape scene text detection with adaptive text region representation[C]// CVPR. 2019: 6449-6458.