Frequency-Time Diffusion with Neural Cellular Automata
This work addresses efficiency and scalability issues in diffusion models for image synthesis, offering a domain-specific improvement for tasks like pathology slice generation and image processing.
The paper tackles the practical challenges of large Denoising Diffusion Models (DDMs) on limited hardware and gigapixel images by introducing two Neural Cellular Automata (NCA)-based DDMs, Diff-NCA and FourierDiff-NCA, which reduce parameter counts and improve performance, with FourierDiff-NCA achieving a three times lower FID score of 43.86 compared to a larger UNet model.
Despite considerable success, large Denoising Diffusion Models (DDMs) with UNet backbone pose practical challenges, particularly on limited hardware and in processing gigapixel images. To address these limitations, we introduce two Neural Cellular Automata (NCA)-based DDMs: Diff-NCA and FourierDiff-NCA. Capitalizing on the local communication capabilities of NCA, Diff-NCA significantly reduces the parameter counts of NCA-based DDMs. Integrating Fourier-based diffusion enables global communication early in the diffusion process. This feature is particularly valuable in synthesizing complex images with important global features, such as the CelebA dataset. We demonstrate that even a 331k parameter Diff-NCA can generate 512x512 pathology slices, while FourierDiff-NCA (1.1m parameters) reaches a three times lower FID score of 43.86, compared to the four times bigger UNet (3.94m parameters) with a score of 128.2. Additionally, FourierDiff-NCA can perform diverse tasks such as super-resolution, out-of-distribution image synthesis, and inpainting without explicit training.