基于扩散模型数据增广的域泛化方法

doi:10.11959/j.issn.2096-6652.202334

智能科学与技术学报 ›› 2023, Vol. 5 ›› Issue (3): 380-388.doi: 10.11959/j.issn.2096-6652.202334

• 专题：扩散模型和人工智能内容生成 • 上一篇下一篇

基于扩散模型数据增广的域泛化方法

童煜钧¹, 王荷清¹, 罗悦恒¹, 宁文欣¹, 关曼丹¹, 喻雯晴¹, 黄柯彦², 张加迅², 马占宇¹

¹ 北京邮电大学人工智能学院，北京 100876
² 北京空间飞行器总体设计部，北京 100094

修回日期:2023-08-10 出版日期:2023-09-01 发布日期:2023-09-26
作者简介:童煜钧（1999- ），男，北京邮电大学人工智能学院博士生，主要研究方向为迁移学习、域泛化
王荷清（2000- ），女，北京邮电大学人工智能学院硕士生，主要研究方向为计算机视觉与机器学习
罗悦恒（2001- ），北京邮电大学模式识别实验室硕士生，主要研究方向为计算机视觉
宁文欣（2001- ），女，北京邮电大学人工智能学院硕士生，主要研究方向为人工智能、计算机视觉
关曼丹（2000- ），女，北京邮电大学人工智能学院硕士生，主要研究方向为人工智能、计算机视觉、少样本学习
喻雯晴（1999- ），女，北京邮电大学人工智能学院硕士生，主要研究方向为计算机视觉、细粒度图像识别
黄柯彦（1977- ），男，就职于北京空间飞行器总体设计部
黄柯彦（1977- ），男，就职于北京空间飞行器总体设计部
马占宇（1982- ），男，博士，北京邮电大学人工智能学院教授、博士生导师，主要研究方向为模式识别、机器学习、计算机视觉、非高斯概率模型、贝叶斯网络
基金资助:
北京市自然科学基金项目(Z200002);国家自然科学基金项目(U19B2036);国家自然科学基金项目(62225601);北京邮电大学优秀青年团队项目(2023QNTD02);北京邮电大学博士生创新基金项目(CX2023112)

Data augmentation method based on diffusion model for domain generalization

Yujun TONG¹, Heqing WANG¹, Yueheng LUO¹, Wenxin NING¹, Mandan GUAN¹, Wenqing YU¹, Keyan HUANG², Jiaxun ZHANG², Zhanyu MA¹

¹ School of Artificial Intelligence, Beijing University of Posts and Telecommunications, Beijing 100876, China
² Beijing Institute of Spacecraft System Engineering, Beijing, 100094, China

Revised:2023-08-10 Online:2023-09-01 Published:2023-09-26
Supported by:
Beijing Natural Science Foundation Project(Z200002);The National Natural Science Foundation of China(U19B2036);The National Natural Science Foundation of China(62225601);Youth Innovative Research Team of BUPT(2023QNTD02);BUPT Excellent Ph.D.Students Foundation(CX2023112)

摘要/Abstract

摘要：

域泛化是计算机视觉领域中一个重要且具有挑战性的问题，该问题源于现实场景中的数据分布偏移。在实际应用中，通常会遇到训练数据和测试数据来自不同的数据域的情况，这种数据分布的差异会导致测试时准确率下降。因此，提出了一种基于隐空间数据增广的域泛化方法，与传统图像级数据增广方法不同，该方法在隐空间中引入扩散模型，以实现对特征的精细控制和多样性生成，从而提升模型在目标域上的泛化能力。具体来说，基于分类器的隐式扩散模型在隐空间训练后可以条件生成准确且丰富的源域特征，并利用高效的采样方法加速生成增广特征。实验结果表明，新提出的方法在各种域泛化任务上取得了显著的性能提升，在真实场景中有较好的有效性和鲁棒性。该方法的创新点在于将数据增广焦点转移到隐空间级别，并引入扩散模型进行增广，为解决域泛化问题提供了一种新的思路。

关键词: 域泛化, 扩散模型, 数据增广

Abstract:

Domain generalization is an important and challenging problem in computer vision, arising from the distribution shift of real-world data.In practical applications, it is common to encounter training and testing data from different domains, and the difference in data distribution can lead to performance degradation during testing.In this paper, we propose a domain generalization method based on latent space data augmentation.Unlike traditional image-level data augmentation approaches, the method introduces a diffusion model in the latent space to achieve fine control and diversity generation of features, thereby achieving feature level data augmentation and enhancing the model's generalization ability in the target domain.Specifically, the classifier-based implicit diffusion model, trained within the latent space, can conditionally generate accurate and rich source domain features.It leverages efficient sampling techniques to expedite the generation of augmented features.Experimental results show that the method has achieved significant performance improvement in various domain generalization tasks, and has good effectiveness and robustness in real scenarios.The key innovation of this paper lies in shifting data augmentation to the latent space level and introducing the diffusion model for augmentation, providing a novel approach to address the domain generalization problem.

Key words: domain generalization, diffusion model, data augmentation

中图分类号:

TP39

童煜钧, 王荷清, 罗悦恒, 等. 基于扩散模型数据增广的域泛化方法[J]. 智能科学与技术学报, 2023, 5(3): 380-388.

Yujun TONG, Heqing WANG, Yueheng LUO, et al. Data augmentation method based on diffusion model for domain generalization[J]. Chinese Journal of Intelligent Science and Technology, 2023, 5(3): 380-388.

图/表 8

图1

图2

图3

表1

表2

表3

图4

图5

参考文献 37

[1]	LI D , ZHANG J S , YANG Y X ,et al. Episodic training for domain generalization[C]// Proceedings of 2019 IEEE/CVF International Conference on Computer Vision(ICCV). Piscataway:IEEE Press, 2019: 1446-1455.
[2]	LI D , YANG Y X , SONG Y Z ,et al. Deeper,broader and artier domain generalization[C]// Proceedings of 2017 IEEE International Conference on Computer Vision(ICCV). Piscataway:IEEE Press, 2017: 5543-5551.
[3]	MUANDET K , BALDUZZI D , SCH?LKOPF B . Domain generalization via invariant feature representation[C]// Proceedings of the 30th International Conference on Machine Learning. Piscataway:IEEE Press, 2013.
[4]	TORRALBA A , EFROS A A . Unbiased look at dataset bias[C]// Proceedings of the IEEE International Conference on Computer Vision. Piscataway:IEEE Press, 2011: 1521-1528.
[5]	SUN B C , SAENKO K . Deep CORAL:correlation alignment for deep domain adaptation[C]// Proceedings of the 2016 Computer Vision–ECCV Workshops. Piscataway:IEEE Press, 2016: 443-450.
[6]	LI H L , PAN S J , WANG S Q ,et al. Domain generalization with adversarial feature learning[C]// Proceedings of 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Piscataway:IEEE Press, 2018: 5400-5409.
[7]	BALAJI Y , SANKARANARAYANAN S , CHELLAPPA R . MetaReg:towards domain generalization using meta-regularization[C]// Proceedings of the 32nd International Conference on Neural Information Processing Systems. New York:ACM, 2018: 1006-1016.
[8]	HO J , JAIN A , ABBEEL P . Denoising diffusion probabilistic models[C]// Proceedings of the 34th International Conference on Neural Information Processing Systems. New York:ACM, 2020: 6840-6851.
[9]	ZHOU K Y , YANG Y X , HOSPEDALES T ,et al. Deep domainadversarial image generation for domain generalisation[J]. Proceedings of the AAAI Conference on Artificial Intelligence, 2020,34(7): 13025-13032.
[10]	ZHOU K , YANG Y , QIAO Y ,et al. Domain adaptive ensemble learning[J]. IEEE Transactions on Image Processing, 2021,30: 8008-8018.
[11]	VOLPI R , NAMKOONG H , SENER O ,et al. Generalizing to unseen domains via adversarial data augmentation[EB]. arXiv preprint, 2018,arXiv:1805.12018.
[12]	LI P , LI D , LI W ,et al. A simple feature augmentation for domain generalization[C]// Proceedings of 2021 IEEE/CVF International Conference on Computer Vision(ICCV). Piscataway:IEEE Press, 2021: 8886-8895.
[13]	SHORTEN C , KHOSHGOFTAAR T M . A survey on image data augmentation for deep learning[J]. Journal of Big Data, 2019,6(1): 1-48.
[14]	HONARVAR NAZARI N , KOVASHKA A . Domain generalization using shape representation[C]// Proceedings of the Computer Vision–ECCV 2020 Workshops. Cham:Springer, 2020: 666-670.
[15]	SONG Y , ERMON S . Generative modeling by estimating gradients of the data distribution[EB]. arXiv preprint, 2019,arXiv:1907.05600.
[16]	ROMBACH R , BLATTMANN A , LORENZ D ,et al. High-resolution image synthesis with latent diffusion models[C]// Proceedings of 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). Piscataway:IEEE Press, 2022: 10684-10695.
[17]	GHIFARY M , KLEIJN W B , ZHANG M J ,et al. Domain generalization for object recognition with multi-task autoencoders[C]// Proceedings of 2015 IEEE international Conference on Computer Vision(ICCV). Piscataway:IEEE Press, 2015: 2551-2559.
[18]	LI D , YANG Y X , SONG Y Z ,et al. Learning to generalize:metalearning for domain generalization[C]// Proceedings of the AAAI Conference on Artificial Intelligence. Piscataway:IEEE Press, 2018.
[19]	DOU Q , CASTRO D C , KAMNITSAS K ,et al. Domain generalization via model-agnostic learning of semantic features[EB]. arXiv preprint, 2019,arXiv:1910.13580.
[20]	GAO B , GOUK H , YANG Y ,et al. Loss function learning for domain generalization by implicit gradient[C]// Proceedings of the 39nd International Conference on Machine Learning.[S.l.:s.n.], 2022: 7002-7016.
[21]	LI Y , YANG Y , ZHOU W ,et al. Feature-critic networks for heterogeneous domain generalization[C]// Proceedings of the 36nd International Conference on Machine Learning.[S.l.:s.n.], 2019: 3915-3924.
[22]	KHOSLA A , ZHOU T H , MALISIEWICZ T ,et al. Undoing the damage of dataset bias[C]// Proceedings of the European Conference on Computer Vision. Heidelberg:Springer, 2012: 158-171.
[23]	CARLUCCI F M , D'INNOCENTE A , BUCCI S ,et al. Domain generalization by solving jigsaw puzzles[C]// Proceedings of 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition(CVPR). Piscataway:IEEE Press, 2019: 2229-2238.
[24]	SOHL-DICKSTEIN J , WEISS E A , MAHESWARANATHAN N ,et al. Deep unsupervised learning using nonequilibrium thermodynamics[C]// Proceedings of the 32nd International Conference on Machine Learning. New York:ACM, 2015: 2256-2265.
[25]	KINGMA D , SALIMANS T , POOLE B ,et al. Variational diffusion models[EB]. arXiv preprint, 2021,arXiv:2107.00630.
[26]	KONG Z , PING W , HUANG J ,et al. Diffwave:a versatile diffusion model for audio synthesis[EB]. arXiv preprint, 2020,arXiv:2009.09761.
[27]	SAHARIA C , HO J , CHAN W ,et al. Image super-resolution via iterative refinement[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2022,45(4): 4713-4726.
[28]	SINHA A , SONG J , MENG C ,et al. D2c:diffusion-denoising models for few-shot conditional generation[EB]. arXiv preprint, 2021,arXiv:2106.06819.
[29]	HO J , SAHARIA C , CHAN W ,et al. Cascaded diffusion models for high fidelity image generation[EB]. The Journal of Machine Learning Research. 2022,3(1): 2249-2281.
[30]	SONG Y , SOHL-DICKSTEIN J , KINGMA D P , et al ,et al. Score-based generative modeling through stochastic differential equations[EB]. arXiv preprint, 2020,arXiv:2011.13456.
[31]	DHARIWAL P , NICHOL A . Diffusion models beat GANs on image synthesis[EB]. arXiv preprint, 2021,arXiv:2105.05233.
[32]	SAHARIA C , CHAN W , CHANG H W ,et al. Palette:image-to-image diffusion models[C]// ACM SIGGRAPH 2022 Conference Proceedings. New York:ACM, 2022: 1-10.
[33]	SONG J , MENG C , ERMON S ,et al. Denoising diffusion implicit models[EB]. arXiv preprint, 2020,arXiv:2010.02502.
[34]	LU C , ZHOU Y , BAO F ,et al. DPM-solver:a fast ODE solver for diffusion probabilistic model sampling in around 10 steps[EB]. arXiv preprint, 2022,arXiv:2206.00927.
[35]	MATSUURA T , HARADA T . Domain generalization using a mixture of multiple latent domains[C]// Proceedings of the AAAI Conference on Artificial Intelligence. Piscataway:IEEE Press, 2020: 11749-11756.
[36]	CHEN K Y , ZHUANG D , CHANG J M . Discriminative adversarial domain generalization with meta-learning based cross-domain validation[J]. Neurocomputing, 2022,467: 418-426.
[37]	SEGU M , TONIONI A , TOMBARI F . Batch normalization embeddings for deep domain generalization[J]. Pattern Recognition. 2023,135: 109-115.

方法	A	C	P	S	平均
ERM	78.6	74.4	96.3	72.4	80.4
Jigen	79.4	75.3	96.0	71.6	80.5
SFA-A	81.2	77.8	93.9	73.7	81.7
MMLD	81.3	77.2	96.1	72.3	81.7
DADG	79.9	76.3	94.9	70.5	80.4
BNE	78.8	78.9	94.8	79.7	83.1
本文方法（50步+2阶）	81.9	73.7	96.6	74.8	81.7
本文方法（100步+2阶）	81.8	75.4	96.6	75.1	82.2
本文方法+MMD	82.5	76.6	96.4	77.4	83.2

方法	A	C	P	S	平均
无分类器引导	81.0	73.3	96.7	73.3	81.1
本文方法	81.8	73.7	96.6	74.8	81.7

步数	阶数	A	C	P	S	平均
20	2	80.8	73.7	96.7	75.1	81.6
50	2	81.8	73.7	96.6	74.8	81.7
20	1	80.4	73.3	96.5	75.0	81.3
50	1	81.3	73.4	96.4	74.7	81.5

基于扩散模型数据增广的域泛化方法

Data augmentation method based on diffusion model for domain generalization

在线阅读

PDF下载

可视化

摘要/Abstract

引用本文

使用本文

图/表 8

参考文献 37

相关文章 1

Metrics

推荐阅读 0