SD AIMay 11

Multi-layer attentive probing improves transfer of audio representations for bioacoustics

Marius Miron, David Robinson, Masato Hagiwara, Titouan Parcollet, Jules Cauzinille, Gagan Narula, Milad Alizadeh, Ellen Gilsenan-McMahon, Sara Keen, Emmanuel Chemla, Benjamin Hoffman, Maddie Cusimano

arXiv:2605.1049482.4

Predicted impact top 14% in SD · last 90 daysOriginality Incremental advance

AI Analysis

For researchers evaluating audio representation learning in bioacoustics, this work reveals that probe design biases benchmark results, advocating for more informative evaluation protocols.

The paper shows that using multi-layer attention probes instead of standard last-layer linear probes significantly improves downstream task performance on bioacoustic benchmarks (BEANs and BirdSet), suggesting current benchmarks may misrepresent encoder quality.

Probing heads map the representations learned from audio by a machine learning model to downstream task labels and are a key component in evaluating representation learning. Most bioacoustic benchmarks use a fixed, low-capacity probe, such as a linear layer on the final encoder layer. While this standardization enables model comparisons, it may bias results by overlooking the interaction between encoder features and probe design. In this work, we systematically study different probing strategies across two bioacoustic benchmarks, BEANs and BirdSet. We evaluate last- and multi-layer probing, across linear and attention probes. We show that larger probe heads that leverage time information have superior performance. Our results suggest that current benchmarks may misrepresent encoder quality when relying on a last-layer probing setup. Multi-layer probing improves downstream task performance across all tested models, while attention probing has superior performance to linear probing for transformer models.

View on arXiv PDF

Similar