GRAIARCVApr 8, 2025

CDM-QTA: Quantized Training Acceleration for Efficient LoRA Fine-Tuning of Diffusion Model

arXiv:2504.07998v1h-index: 10ISCAS
Originality Incremental advance
AI Analysis

This work addresses efficient fine-tuning of diffusion models for mobile applications, representing an incremental improvement with specific optimizations.

The paper tackled the problem of high computational demands for fine-tuning large diffusion models on mobile devices by developing a quantized training accelerator for LoRA fine-tuning, achieving up to 1.81x training speedup and 5.50x energy efficiency improvements with minimal impact on image quality.

Fine-tuning large diffusion models for custom applications demands substantial power and time, which poses significant challenges for efficient implementation on mobile devices. In this paper, we develop a novel training accelerator specifically for Low-Rank Adaptation (LoRA) of diffusion models, aiming to streamline the process and reduce computational complexity. By leveraging a fully quantized training scheme for LoRA fine-tuning, we achieve substantial reductions in memory usage and power consumption while maintaining high model fidelity. The proposed accelerator features flexible dataflow, enabling high utilization for irregular and variable tensor shapes during the LoRA process. Experimental results show up to 1.81x training speedup and 5.50x energy efficiency improvements compared to the baseline, with minimal impact on image generation quality.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes