LG CLMar 4, 2025

FourierNAT: A Fourier-Mixing-Based Non-Autoregressive Transformer for Parallel Sequence Generation

Andrew Kiruluta, Eric Lundy, Andreas Lemos

arXiv:2503.07630v21 citationsh-index: 3Int J Complex Appl Sci Technol

Originality Incremental advance

AI Analysis

This addresses the challenge of slow inference in sequence generation for tasks like machine translation and summarization, offering computational savings, though it is incremental as it builds on existing non-autoregressive methods.

The paper tackles the problem of generating sequences in parallel with non-autoregressive Transformers, which often struggle with global dependencies, by proposing FourierNAT that uses Fourier-based mixing and learned frequency-domain gating to propagate context efficiently. It achieves competitive results on benchmarks like WMT machine translation and CNN/DailyMail summarization, with significant speed advantages over autoregressive Transformers.

We present FourierNAT, a novel non-autoregressive Transformer (NAT) architecture that employs Fourier-based mixing in the decoder to generate output sequences in parallel. While traditional NAT approaches often face challenges with capturing global dependencies, our method leverages a discrete Fourier transform to mix token embeddings across the entire sequence dimension, coupled with learned frequency-domain gating. This allows the model to efficiently propagate context without explicit autoregressive steps. Empirically, FourierNAT achieves competitive results against leading NAT baselines on standard benchmarks like WMT machine translation and CNN/DailyMail summarization, providing significant speed advantages over autoregressive Transformers. We further demonstrate that learned frequency-domain parameters allow the model to adaptively focus on long-range or short-range dependencies, partially mitigating the well-known coherence gaps in one-pass NAT generation. Overall, FourierNAT highlights the potential of integrating spectral-domain operations to accelerate and improve parallel text generation. This approach can potentially provide great computational and time savings in inference tasks LLMs.

View on arXiv PDF

Similar