Dong Miao

CV
h-index7
4papers
35citations
Novelty53%
AI Score39

4 Papers

CVNov 20, 2022Code
Rethinking the Paradigm of Content Constraints in Unpaired Image-to-Image Translation

Xiuding Cai, Yaoyao Zhu, Dong Miao et al.

In an unpaired setting, lacking sufficient content constraints for image-to-image translation (I2I) tasks, GAN-based approaches are usually prone to model collapse. Current solutions can be divided into two categories, reconstruction-based and Siamese network-based. The former requires that the transformed or transforming image can be perfectly converted back to the original image, which is sometimes too strict and limits the generative performance. The latter involves feeding the original and generated images into a feature extractor and then matching their outputs. This is not efficient enough, and a universal feature extractor is not easily available. In this paper, we propose EnCo, a simple but efficient way to maintain the content by constraining the representational similarity in the latent space of patch-level features from the same stage of the \textbf{En}coder and de\textbf{Co}der of the generator. For the similarity function, we use a simple MSE loss instead of contrastive loss, which is currently widely used in I2I tasks. Benefits from the design, EnCo training is extremely efficient, while the features from the encoder produce a more positive effect on the decoding, leading to more satisfying generations. In addition, we rethink the role played by discriminators in sampling patches and propose a discriminative attention-guided (DAG) patch sampling strategy to replace random sampling. DAG is parameter-free and only requires negligible computational overhead, while significantly improving the performance of the model. Extensive experiments on multiple datasets demonstrate the effectiveness and advantages of EnCo, and we achieve multiple state-of-the-art compared to previous methods. Our code is available at https://github.com/XiudingCai/EnCo-pytorch.

IVAug 7, 2023
Energy-Guided Diffusion Model for CBCT-to-CT Synthesis

Linjie Fu, Xia Li, Xiuding Cai et al.

Cone Beam CT (CBCT) plays a crucial role in Adaptive Radiation Therapy (ART) by accurately providing radiation treatment when organ anatomy changes occur. However, CBCT images suffer from scatter noise and artifacts, making relying solely on CBCT for precise dose calculation and accurate tissue localization challenging. Therefore, there is a need to improve CBCT image quality and Hounsfield Unit (HU) accuracy while preserving anatomical structures. To enhance the role and application value of CBCT in ART, we propose an energy-guided diffusion model (EGDiff) and conduct experiments on a chest tumor dataset to generate synthetic CT (sCT) from CBCT. The experimental results demonstrate impressive performance with an average absolute error of 26.87$\pm$6.14 HU, a structural similarity index measurement of 0.850$\pm$0.03, a peak signal-to-noise ratio of the sCT of 19.83$\pm$1.39 dB, and a normalized cross-correlation of the sCT of 0.874$\pm$0.04. These results indicate that our method outperforms state-of-the-art unsupervised synthesis methods in accuracy and visual quality, producing superior sCT images.

CVJul 8, 2024
Invariance Principle Meets Vicinal Risk Minimization

Yaoyao Zhu, Xiuding Cai, Yingkai Wang et al.

Deep learning models excel in computer vision tasks but often fail to generalize to out-of-distribution (OOD) domains. Invariant Risk Minimization (IRM) aims to address OOD generalization by learning domain-invariant features. However, IRM struggles with datasets exhibiting significant diversity shifts. While data augmentation methods like Mixup and Semantic Data Augmentation (SDA) enhance diversity, they risk over-augmentation and label instability. To address these challenges, we propose a domain-shared Semantic Data Augmentation (SDA) module, a novel implementation of Variance Risk Minimization (VRM) designed to enhance dataset diversity while maintaining label consistency. We further provide a Rademacher complexity analysis, establishing a tighter generalization error bound compared to baseline methods. Extensive evaluations on OOD benchmarks, including PACS, VLCS, OfficeHome, and TerraIncognita, demonstrate consistent performance improvements over state-of-the-art domain generalization methods.

CVSep 17, 2025Code
Self Identity Mapping

Xiuding Cai, Yaoyao Zhu, Linjie Fu et al.

Regularization is essential in deep learning to enhance generalization and mitigate overfitting. However, conventional techniques often rely on heuristics, making them less reliable or effective across diverse settings. We propose Self Identity Mapping (SIM), a simple yet effective, data-intrinsic regularization framework that leverages an inverse mapping mechanism to enhance representation learning. By reconstructing the input from its transformed output, SIM reduces information loss during forward propagation and facilitates smoother gradient flow. To address computational inefficiencies, We instantiate SIM as $ ρ\text{SIM} $ by incorporating patch-level feature sampling and projection-based method to reconstruct latent features, effectively lowering complexity. As a model-agnostic, task-agnostic regularizer, SIM can be seamlessly integrated as a plug-and-play module, making it applicable to different network architectures and tasks. We extensively evaluate $ρ\text{SIM}$ across three tasks: image classification, few-shot prompt learning, and domain generalization. Experimental results show consistent improvements over baseline methods, highlighting $ρ\text{SIM}$'s ability to enhance representation learning across various tasks. We also demonstrate that $ρ\text{SIM}$ is orthogonal to existing regularization methods, boosting their effectiveness. Moreover, our results confirm that $ρ\text{SIM}$ effectively preserves semantic information and enhances performance in dense-to-dense tasks, such as semantic segmentation and image translation, as well as in non-visual domains including audio classification and time series anomaly detection. The code is publicly available at https://github.com/XiudingCai/SIM-pytorch.