LGNov 3, 2025

Memory-Efficient Training with In-Place FFT Implementation

Xinyu Ding, Bangtian Liu, Siyu Liao, Zhongfeng Wang

arXiv:2511.01385v14.1h-index: 1

Originality Incremental advance

AI Analysis

This addresses memory constraints for deep learning practitioners using frequency-domain methods, though it appears incremental as it builds on existing FFT techniques.

The paper tackled the problem of memory inefficiency in Fast Fourier Transform implementations for deep learning by developing the first real-domain, fully in-place FFT framework (rdFFT), which eliminates intermediate cache usage through an implicit complex encoding scheme. Experiments on natural language understanding tasks showed reduced training memory costs.

Fast Fourier Transforms (FFT) are widely used to reduce memory and computational costs in deep learning. However, existing implementations, including standard FFT and real FFT (rFFT), cannot achieve true in-place computation. In particular, rFFT maps an input of size n to a complex output of size n/2+1, causing dimensional mismatch and requiring additional memory allocation. We propose the first real-domain, fully in-place FFT framework (rdFFT) that preserves input-output memory space consistency. By leveraging butterfly operation symmetry and conjugate properties in the frequency domain, we design an implicit complex encoding scheme that eliminates intermediate cache usage entirely. Experiments on multiple natural language understanding tasks demonstrate the method effectiveness in reducing training memory cost, offering a promising direction for frequency-domain lightweight adaptation.

View on arXiv PDF

Similar