Chinese Journal of Network and Information Security ›› 2022, Vol. 8 ›› Issue (6): 146-155.doi: 10.11959/j.issn.2096-109x.2022075

• Papers and Reports • Previous Articles     Next Articles

Lip forgery detection via spatial-frequency domain combination

Jiaying LIN1,2, Wenbo ZHOU1,2, Weiming ZHANG1,2, Nenghai YU1,2   

  1. 1 Key Laboratory of Electromagnetic Space Information, Chinese Academy of Sciences, Hefei 230027, China
    2 School of Cyber Science, University of Science and Technology of China, Hefei 230027, China
  • Revised:2022-07-09 Online:2022-12-15 Published:2023-01-16
  • Supported by:
    The National Natural Science Foundation of China(U20B2047);The National Natural Science Foundation of China(62072421);The National Natural Science Foundation of China(62002334);The National Natural Science Foundation of China(62102386);The National Natural Science Foundation of China(62121002);Exploration Fund Project of University of Science and Technology of China(YD3480002001);Fundamental Research Funds for the Central Universities(WK2100000011)

Abstract:

In recent years, numerous “face-swapping” videos have emerged in social networks, one of the representatives is the lip forgery with speakers.While making life more entertaining for the public, it poses a significant crisis for personal privacy and property security in cyberspace.Currently, under non-destructive conditions, most of the lip forgery detection methods achieve good performance.However, the compression operations are widely used in practice especially in social media platforms, face recognition and other scenarios.While saving pixel and time redundancy, the compression operations affect the video quality and destroy the coherent integrity of pixel-to-pixel and frame-to-frame in the spatial domain, and then the degradation of its detection performance and even misjudgment of the real video will be caused.When the information in the spatial domain cannot provide sufficiently effective features, the information in the frequency domain naturally becomes a priority research object because it can resist compression interference.Aiming at this problem, the advantages of frequency information in image structure and gradient feedback were analyzed.Then the lip forgery detection via spatial-frequency domain combination was proposed, which effectively utilized the corresponding characteristics of information in spatial and frequency domains.For lip features in the spatial domain, an adaptive extraction network and a light-weight attention module were designed.For frequency features in the frequency domain, separate extraction and fusion modules for different components were designed.Subsequently, by conducting a weighted fusion of lip features in spatial domain and frequency features in frequency domain, more texture information was preserved.In addition, fine-grained constraints were designed during the training to separate the inter-class distance of real and fake lip features while closing the intra-class distance.Experimental results show that, benefiting from the frequency information, the proposed method can enhance the detection accuracy under compression situation with certain transferability.On the other hand, in the ablation study conducted on the core modules, the results verify the effectiveness of the frequency component for anti-compression and the constraint of the dual loss function in training.

Key words: DeepFake forgery, DeepFake detection and defense, lip forgery detection, anti-compression, deep learning

CLC Number: 

No Suggested Reading articles found!