CVAIOct 31, 2024

EDT: An Efficient Diffusion Transformer Framework Inspired by Human-like Sketching

arXiv:2410.23788v15 citationsh-index: 18Has CodeNIPS
Originality Incremental advance
AI Analysis

This work addresses efficiency bottlenecks for practical applications of transformer-based diffusion models, representing an incremental improvement with specific gains.

This paper tackles the high computational cost of transformer-based diffusion models by proposing the Efficient Diffusion Transformer (EDT) framework, which reduces training and inference costs while improving image synthesis performance, achieving speed-ups of up to 3.93x in training and 2.29x in inference compared to MDTv2.

Transformer-based Diffusion Probabilistic Models (DPMs) have shown more potential than CNN-based DPMs, yet their extensive computational requirements hinder widespread practical applications. To reduce the computation budget of transformer-based DPMs, this work proposes the Efficient Diffusion Transformer (EDT) framework. The framework includes a lightweight-design diffusion model architecture, and a training-free Attention Modulation Matrix and its alternation arrangement in EDT inspired by human-like sketching. Additionally, we propose a token relation-enhanced masking training strategy tailored explicitly for EDT to augment its token relation learning capability. Our extensive experiments demonstrate the efficacy of EDT. The EDT framework reduces training and inference costs and surpasses existing transformer-based diffusion models in image synthesis performance, thereby achieving a significant overall enhancement. With lower FID, EDT-S, EDT-B, and EDT-XL attained speed-ups of 3.93x, 2.84x, and 1.92x respectively in the training phase, and 2.29x, 2.29x, and 2.22x respectively in inference, compared to the corresponding sizes of MDTv2. The source code is released at https://github.com/xinwangChen/EDT.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes