ASLGSDOct 5, 2025

Drax: Speech Recognition with Discrete Flow Matching

arXiv:2510.04162v12 citationsh-index: 27
Originality Incremental advance
AI Analysis

This work addresses the challenge of efficient parallel decoding in speech recognition, representing an incremental advance in non-autoregressive ASR methods.

The paper tackles the problem of applying non-autoregressive flow matching to automatic speech recognition by proposing Drax, a framework that uses audio-conditioned probability paths to align training with inference. The result is a model that achieves state-of-the-art recognition accuracy while offering improved accuracy-efficiency trade-offs.

Diffusion and flow-based non-autoregressive (NAR) models have shown strong promise in large language modeling, however, their potential for automatic speech recognition (ASR) remains largely unexplored. We propose Drax, a discrete flow matching framework for ASR that enables efficient parallel decoding. To better align training with inference, we construct an audio-conditioned probability path that guides the model through trajectories resembling likely intermediate inference errors, rather than direct random noise to target transitions. Our theoretical analysis links the generalization gap to divergences between training and inference occupancies, controlled by cumulative velocity errors, thereby motivating our design choice. Empirical evaluation demonstrates that our approach attains recognition accuracy on par with state-of-the-art speech models while offering improved accuracy-efficiency trade-offs, highlighting discrete flow matching as a promising direction for advancing NAR ASR.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes