SDLGASFeb 7, 2025

Meta Audiobox Aesthetics: Unified Automatic Quality Assessment for Speech, Music, and Sound

arXiv:2502.05139v1147 citationsh-index: 25Has Code
Originality Incremental advance
AI Analysis

This work addresses the need for automated systems to predict audio aesthetics for applications like data filtering and generative model evaluation, representing an incremental advance in the field.

The paper tackles the challenge of automated audio aesthetic assessment by introducing new annotation guidelines and no-reference prediction models, achieving performance comparable to or better than human mean opinion scores and existing methods.

The quantification of audio aesthetics remains a complex challenge in audio processing, primarily due to its subjective nature, which is influenced by human perception and cultural context. Traditional methods often depend on human listeners for evaluation, leading to inconsistencies and high resource demands. This paper addresses the growing need for automated systems capable of predicting audio aesthetics without human intervention. Such systems are crucial for applications like data filtering, pseudo-labeling large datasets, and evaluating generative audio models, especially as these models become more sophisticated. In this work, we introduce a novel approach to audio aesthetic evaluation by proposing new annotation guidelines that decompose human listening perspectives into four distinct axes. We develop and train no-reference, per-item prediction models that offer a more nuanced assessment of audio quality. Our models are evaluated against human mean opinion scores (MOS) and existing methods, demonstrating comparable or superior performance. This research not only advances the field of audio aesthetics but also provides open-source models and datasets to facilitate future work and benchmarking. We release our code and pre-trained model at: https://github.com/facebookresearch/audiobox-aesthetics

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes