CVAIApr 15, 2025

Bringing together invertible UNets with invertible attention modules for memory-efficient diffusion models

arXiv:2504.10883v1h-index: 7
Originality Synthesis-oriented
AI Analysis

This addresses memory efficiency for medical image synthesis on 3D datasets like CT-scans and MRIs, representing an incremental improvement with domain-specific impact.

The paper tackles the problem of high computational resource requirements in diffusion models for medical image synthesis by proposing an invertible UNet architecture with invertible attention modules, achieving up to 15% reduction in peak memory consumption during training on the 3D BraTS2020 dataset while maintaining image quality comparable to state-of-the-art.

Diffusion models have recently gained state of the art performance on many image generation tasks. However, most models require significant computational resources to achieve this. This becomes apparent in the application of medical image synthesis due to the 3D nature of medical datasets like CT-scans, MRIs, electron microscope, etc. In this paper we propose a novel architecture for a single GPU memory-efficient training for diffusion models for high dimensional medical datasets. The proposed model is built by using an invertible UNet architecture with invertible attention modules. This leads to the following two contributions: 1. denoising diffusion models and thus enabling memory usage to be independent of the dimensionality of the dataset, and 2. reducing the energy usage during training. While this new model can be applied to a multitude of image generation tasks, we showcase its memory-efficiency on the 3D BraTS2020 dataset leading to up to 15\% decrease in peak memory consumption during training with comparable results to SOTA while maintaining the image quality.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes