通信学报 ›› 2024, Vol. 45 ›› Issue (4): 146-159.doi: 10.11959/j.issn.1000-436x.2024079

• 学术论文 • 上一篇    

面向强后处理场景的图像篡改定位模型

谭舜泉1,2,3, 廖桂樱1,2,3, 彭荣煊2,3,4, 黄继武5()   

  1. 1.深圳大学计算机与软件学院, 广东 深圳 518060
    2.深圳市媒体信息内容安全重点实验室, 广东 深圳 518060
    3.广东省智能信息处理实验室, 广东 深圳 518060
    4.深圳大学电子与信息工程学院, 广东 深圳 518060
    5.深圳北理莫斯科大学工程系智能感知与计算广东省重点实验室, 广东 深圳 518116
  • 收稿日期:2023-11-13 修回日期:2024-03-11 出版日期:2024-04-30 发布日期:2024-05-27
  • 通讯作者: 黄继武 E-mail:jwhuang@szu.edu.cn
  • 作者简介:谭舜泉(1980- ),男,广东湛江人,博士,深圳大学教授,主要研究方向为多媒体取证、隐写分析、深度学习等。
    廖桂樱(1999- ),女,广西钦州人,深圳大学硕士生,主要研究方向为多媒体取证、深度学习。
    彭荣煊(1998- ),男,广东揭阳人,深圳大学博士生,主要研究方向为多媒体取证、强化学习、深度学习。
    黄继武(1962- ),男,广东揭阳人,博士,深圳北理莫斯科大学教授,主要研究方向为多媒体取证与安全、多媒体信号处理、信息隐藏等。
  • 基金资助:
    国家自然科学基金资助项目(62272314);广东省重点实验室基金资助项目(2023- B1212060076)

Image tampering localization model for intensive post-processing scenarios

Shunquan TAN1,2,3, Guiying LIAO1,2,3, Rongxuan PENG2,3,4, Jiwu HUANG5()   

  1. 1.College of Computer Science and Software Engineering, Shenzhen University, Shenzhen 518060, China
    2.Guangdong Key Laboratory of Intelligent Information Processing, Shenzhen Key Laboratory of Media Security, Shenzhen 518060, China
    3.Guangdong Provincial Key Laboratory of Intelligent Information Processing, Shenzhen 518060, China
    4.College of Electronics and Information Engineering, Shenzhen University, Shenzhen 518060, China
    5.Guangdong Laboratory of Machine Perception and Intelligent Computing, Faculty of Engineering, Shenzhen MSU-BIT University, Shenzhen 518116, China
  • Received:2023-11-13 Revised:2024-03-11 Online:2024-04-30 Published:2024-05-27
  • Contact: Jiwu HUANG E-mail:jwhuang@szu.edu.cn
  • Supported by:
    The National Natural Science Foundation of China(62272314);Guangdong Provincial Key Laboratory(20231B1212060076)

摘要:

针对微信、微博等社交平台对图像进行的压缩、尺度拉伸等有损操作带来的篡改痕迹模糊或被破坏的挑战,提出了一种对抗强后处理的图像篡改定位模型。该模型选用了基于Transformer的金字塔视觉转换器作为编码器,用于提取图像的篡改特征。同时,设计了一个类UNet结构的端到端编码器-解码器架构。金字塔视觉转换器的金字塔结构和注意力机制可以灵活关注图像的各个区块,结合类UNet结构能够多尺度地提取图像上下文间的关联信息,对强后处理的图像有着较好的鲁棒性。实验结果表明,所提模型在对抗JPEG压缩、高斯模糊等常见的后处理操作以及在不同社交媒体传播场景的数据集上的定位性能上明显优于目前主流的篡改定位模型,展现出了优异的鲁棒性。

关键词: 强后处理场景, 图像篡改定位, 鲁棒性, 金字塔视觉转换器

Abstract:

Addressing the challenges of blurred or destroyed tampering traces presented by lossy operations such as image compression and scaling on images within social platforms like WeChat and Weibo, an adversarial image tampering localization model was introduced. Utilizing the pyramid vision transformer, which was built upon the Transformer architecture, as an encoder for extracting tampering features from images. Simultaneously, an end-to-end encoder-decoder structure, reminiscent of the UNet architecture, was formulated. The pyramid structure and attention mechanisms inherented to the pyramid vision transformer afforded a flexible examination of diverse image regions. When integrated with the UNet-like architecture, it facilitated multiscale contextual information extraction, thereby fortifying the model's resilience to intense post-processing effects. Empirical results illustrate that the proposed model exhibits a substantial performance advantage over conventional tampering localization models, particularly in scenarios involving prevalent post-processing techniques such as JPEG compression and Gaussian blur. Notably, the model demonstrates exceptional robustness in assessments conducted with datasets representing diverse social media dissemination scenarios.

Key words: intensive post-processing scenario, image tampering localization, robustness, pyramid vision transformer

中图分类号: 

No Suggested Reading articles found!