CVDec 11, 2025
Group Diffusion: Enhancing Image Generation by Unlocking Cross-Sample CollaborationSicheng Mo, Thao Nguyen, Richard Zhang et al.
In this work, we explore an untapped signal in diffusion model inference. While all previous methods generate images independently at inference, we instead ask if samples can be generated collaboratively. We propose Group Diffusion, unlocking the attention mechanism to be shared across images, rather than limited to just the patches within an image. This enables images to be jointly denoised at inference time, learning both intra and inter-image correspondence. We observe a clear scaling effect - larger group sizes yield stronger cross-sample attention and better generation quality. Furthermore, we introduce a qualitative measure to capture this behavior and show that its strength closely correlates with FID. Built on standard diffusion transformers, our GroupDiff achieves up to 32.2% FID improvement on ImageNet-256x256. Our work reveals cross-sample inference as an effective, previously unexplored mechanism for generative modeling.
CVApr 29, 2025
X-Fusion: Introducing New Modality to Frozen Large Language ModelsSicheng Mo, Thao Nguyen, Xun Huang et al.
We propose X-Fusion, a framework that extends pretrained Large Language Models (LLMs) for multimodal tasks while preserving their language capabilities. X-Fusion employs a dual-tower design with modality-specific weights, keeping the LLM's parameters frozen while integrating vision-specific information for both understanding and generation. Our experiments demonstrate that X-Fusion consistently outperforms alternative architectures on both image-to-text and text-to-image tasks. We find that incorporating understanding-focused data improves generation quality, reducing image data noise enhances overall performance, and feature alignment accelerates convergence for smaller models but has minimal impact on larger ones. Our findings provide valuable insights into building efficient unified multimodal models.
IVSep 30, 2019
Nonlinear Dipole Inversion (NDI) enables Quantitative Susceptibility Mapping (QSM) without parameter tuningDaniel Polak, Itthi Chatnuntawech, Jaeyeon Yoon et al.
We propose Nonlinear Dipole Inversion (NDI) for high-quality Quantitative Susceptibility Mapping (QSM) without regularization tuning, while matching the image quality of state-of-the-art reconstruction techniques. In addition to avoiding over-smoothing that these techniques often suffer from, we also obviate the need for parameter selection. NDI is flexible enough to allow for reconstruction from an arbitrary number of head orientations, and outperforms COSMOS even when using as few as 1-direction data. This is made possible by a nonlinear forward-model that uses the magnitude as an effective prior, for which we derived a simple gradient descent update rule. We synergistically combine this physics-model with a Variational Network (VN) to leverage the power of deep learning in the VaNDI algorithm. This technique adopts the simple gradient descent rule from NDI and learns the network parameters during training, hence requires no additional parameter tuning. Further, we evaluate NDI at 7T using highly accelerated Wave-CAIPI acquisitions at 0.5 mm isotropic resolution and demonstrate high-quality QSM from as few as 2-direction data.