SDAIAug 29, 2025

DRASP: A Dual-Resolution Attentive Statistics Pooling Framework for Automatic MOS Prediction

arXiv:2508.21407v11 citationsh-index: 3APSIPA
Originality Incremental advance
AI Analysis

This work addresses the need for more accurate MOS prediction in audio quality assessment, though it is incremental as it builds on existing pooling methods.

The paper tackled the problem of predicting mean opinion scores (MOS) for audio quality by introducing the DRASP framework, which integrates global and local pooling to capture complementary perceptual insights, resulting in a 10.39% relative improvement in system-level Spearman's rank correlation coefficient over average pooling.

A pooling mechanism is essential for mean opinion score (MOS) prediction, facilitating the transformation of variable-length audio features into a concise fixed-size representation that effectively encodes speech quality. Existing pooling methods typically operate at a singular granularity, concentrating either on a comprehensive global perspective or a detailed frame-level analysis, which may overlook complementary perceptual insights. To address this limitation, we introduce the Dual-Resolution Attentive Statistics Pooling (DRASP) framework. DRASP integrates both coarse-grained, global statistical summaries and fine-grained, attentive analyses of perceptually significant segments. This dual-view architecture empowers our model to formulate a more thorough and robust representation, capturing both the overarching structural context and salient local details concurrently. Extensive experiments validate the effectiveness and strong generalization ability of the proposed framework. It consistently outperforms various baseline methods across diverse datasets (MusicEval and AES-Natural), MOS prediction backbones (including a CLAP-based model and AudioBox-Aesthetics), and different audio generation systems, achieving a relative improvement of 10.39% in system-level Spearman's rank correlation coefficient (SRCC) over the widely-used average pooling approach.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes