SDAILGASFeb 21, 2025

KAD: No More FAD! An Effective and Efficient Evaluation Metric for Audio Generation

arXiv:2502.15602v224 citationsh-index: 5Has Code
Originality Highly original
AI Analysis

This provides a more efficient and reliable evaluation metric for researchers and practitioners in audio generation, addressing specific bottlenecks in existing methods.

The paper tackles the limitations of the Fréchet Audio Distance (FAD) for evaluating generated audio by introducing the Kernel Audio Distance (KAD), a distribution-free and computationally efficient metric based on Maximum Mean Discrepancy (MMD), which shows advantages such as faster convergence with smaller sample sizes, lower computational cost, and stronger alignment with human perceptual judgments.

Although being widely adopted for evaluating generated audio signals, the Fréchet Audio Distance (FAD) suffers from significant limitations, including reliance on Gaussian assumptions, sensitivity to sample size, and high computational complexity. As an alternative, we introduce the Kernel Audio Distance (KAD), a novel, distribution-free, unbiased, and computationally efficient metric based on Maximum Mean Discrepancy (MMD). Through analysis and empirical validation, we demonstrate KAD's advantages: (1) faster convergence with smaller sample sizes, enabling reliable evaluation with limited data; (2) lower computational cost, with scalable GPU acceleration; and (3) stronger alignment with human perceptual judgments. By leveraging advanced embeddings and characteristic kernels, KAD captures nuanced differences between real and generated audio. Open-sourced in the kadtk toolkit, KAD provides an efficient, reliable, and perceptually aligned benchmark for evaluating generative audio models.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes