CVJun 5, 2025

FEAT: Full-Dimensional Efficient Attention Transformer for Medical Video Generation

arXiv:2506.04956v13 citationsh-index: 9Has CodeMICCAI
Originality Highly original
AI Analysis

This work addresses the problem of generating realistic medical videos for healthcare applications, representing an incremental improvement with specific gains in efficiency and performance.

The paper tackles the challenge of synthesizing high-quality dynamic medical videos by proposing FEAT, a full-dimensional efficient attention Transformer that addresses limitations in existing methods, achieving comparable or superior performance with fewer parameters and surpassing all comparison methods across multiple datasets.

Synthesizing high-quality dynamic medical videos remains a significant challenge due to the need for modeling both spatial consistency and temporal dynamics. Existing Transformer-based approaches face critical limitations, including insufficient channel interactions, high computational complexity from self-attention, and coarse denoising guidance from timestep embeddings when handling varying noise levels. In this work, we propose FEAT, a full-dimensional efficient attention Transformer, which addresses these issues through three key innovations: (1) a unified paradigm with sequential spatial-temporal-channel attention mechanisms to capture global dependencies across all dimensions, (2) a linear-complexity design for attention mechanisms in each dimension, utilizing weighted key-value attention and global channel attention, and (3) a residual value guidance module that provides fine-grained pixel-level guidance to adapt to different noise levels. We evaluate FEAT on standard benchmarks and downstream tasks, demonstrating that FEAT-S, with only 23\% of the parameters of the state-of-the-art model Endora, achieves comparable or even superior performance. Furthermore, FEAT-L surpasses all comparison methods across multiple datasets, showcasing both superior effectiveness and scalability. Code is available at https://github.com/Yaziwel/FEAT.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes