网络与信息安全学报 ›› 2023, Vol. 9 ›› Issue (3): 150-160.doi: 10.11959/j.issn.2096-109x.2023046

• 学术论文 • 上一篇    下一篇

融合小波快捷连接生成对抗网络的面部性别伪造

陈万泽, 黄丽清, 陈家祯, 叶锋, 黄添强, 罗海峰   

  1. 福建师范大学计算机与网络空间安全学院,福建 福州 350117
  • 修回日期:2023-03-02 出版日期:2023-06-25 发布日期:2023-06-01
  • 作者简介:陈万泽(1995- ),男,甘肃兰州人,福建师范大学硕士生,主要研究方向为深度学习、生成对抗网络、图像生成
    黄丽清(1991- ),女,福建莆田人,福建师范大学讲师,主要研究方向为视频图像超分辨率、多媒体内容安全
    陈家祯(1971- ),女,福建福州人,福建师范大学副教授,主要研究方向为信息安全
    叶锋(1978- ),男,福建福州人,福建师范大学教授,主要研究方向为视频通信、视频图像处理
    黄添强(1971- ),男,福建仙游人,福建师范大学教授、博士生导师,主要研究方向为人工智能、大数据分析、数字媒体内容安全理论和技术
    罗海峰(1990- ),男,福建龙岩人,福建师范大学讲师,主要研究方向为二维/三维计算机视觉
  • 基金资助:
    国家自然科学基金(62072106);福建省自然科学基金(2020J01168);福建省教育厅(A类)(JAT210053)

Gender forgery of faces by fusing wavelet shortcut connection generative adversarial network

Wanze CHEN, Liqing HUANG, Jiazhen CHEN, Feng YE, Tianqiang HUANG, Haifeng LUO   

  1. College of Computer and Cyber Security, Fujian Normal University, FuZhou 350117, China
  • Revised:2023-03-02 Online:2023-06-25 Published:2023-06-01
  • Supported by:
    The National Natural Science Foundation of China(62072106);Fujian Province Natural Science Foundation(2020J01168);Fujian Provincial Department of Education (Rank A)(JAT210053)

摘要:

面部属性编辑领域的主流方法由于数据以及模型架构的限制,存在如下两方面的问题:自编码器模型在编码以及解码阶段的瓶颈结构会丢失特征信息,以及在解码过程中逐阶段对源域特征进行连续的样式注入,使得生成图像中目标域信息占比过大,导致生成图像丢失了源域的身份信息以及部分细节;在人脸图像中,图像人物的性别、种族或年龄等属性的差异会使得不同图像的频域信息组成差别较大,在无监督的训练前提下,当前主流网络框架并不能在样式注入阶段调整源域信息以及目标域信息之间的占比,导致生成图像仍然存在伪影。针对上述问题,基于生成对抗网络和图像到图像的翻译提出一种面部性别伪造模型,即融合快捷小波连接生成对抗网络(WscGAN)。通过对自编码器结构添加快捷连接,将不同编码阶段的输出通过小波变换进行特征级别的分解,再引入通道注意力机制将其进行逐个处理,以此动态改变在解码过程中不同频率的源域特征信息的比例,最终实现面部图像在性别属性上的伪造。为了验证所提模型的有效性,WscGAN 分别在 CelebA-HQ 数据集和 FFHQ 数据集上进行了实验。实验结果表明,WscGAN 在 CelebA-HQ 以及 FFHQ 数据集上均领先现有先进模型,其中弗雷歇起始距离分别提升5.4%和19.8%,特征度量相似度分别提升1.8%和4.1%。此外,定性的视觉对比结果充分体现了WscGAN可有效改善面部图像性别属性伪造的效果。

关键词: 图像生成, 生成对抗网络, 图像到图像的翻译, 面部属性编辑, 小波变换

Abstract:

The mainstream methods in the field of facial attribute manipulation had the following two defects due to data and model architecture limitations.First, the bottleneck structure of the autoencoder model results in the loss of feature information, and the traditional method of continuously injected styles to the source domain features during the decoding process makes the generated image too referential to the target domain while losing the identity information and fine-grained details.Second, differences in facial attributes composition between images, such as gender, ethnicity, or age can cause variations in frequency domain information.And the current unsupervised training methods do not automatically adjust the proportion of source and target domain information in the style injection stage, resulting in artifacts in generated images.A facial gender forgery model based on generative adversarial networks and image-to-image translation techniques, namely fused wavelet shortcut connection generative adversarial network (WscGAN), was proposed to address the these issues.Shortcut connections were added to the autoencoder structure, and the outputs of different encoding stages were decomposed at the feature level by wavelet transform.Attention mechanism was employed to process them one by one, to dynamically change the proportion of source domain features at different frequencies in the decoding process.This model could complete forgery of facial images in terms of gender attributes.To verify the effectiveness of the model, it was conducted on the CelebA-HQ dataset and the FFHQ dataset.Compared with the existing optimal models, the method improves the FID and LPIPS indices by 5.4% and 11.2%, and by 1.8% and 6.7%, respectively.Furthermore, the effectiveness of the proposed method in improving the gender attribute conversion of facial images is fully demonstrated by the results based on qualitative visual comparisons.

Key words: image generation, generative adversarial network, image-to-image translation, facial attribute manipulation, wavelet transform

中图分类号: 

No Suggested Reading articles found!