CLCVSDASSep 18, 2024

Measuring Sound Symbolism in Audio-visual Models

arXiv:2409.12306v32 citationsh-index: 16
Originality Incremental advance
AI Analysis

This addresses the problem of understanding cognitive-like associations in AI models for researchers in machine learning and cognitive science, though it is incremental in scope.

The study investigated whether pre-trained audio-visual models exhibit sound symbolism, finding a significant correlation between model outputs and established patterns, especially in speech-trained models.

Audio-visual pre-trained models have gained substantial attention recently and demonstrated superior performance on various audio-visual tasks. This study investigates whether pre-trained audio-visual models demonstrate non-arbitrary associations between sounds and visual representations$\unicode{x2013}$known as sound symbolism$\unicode{x2013}$which is also observed in humans. We developed a specialized dataset with synthesized images and audio samples and assessed these models using a non-parametric approach in a zero-shot setting. Our findings reveal a significant correlation between the models' outputs and established patterns of sound symbolism, particularly in models trained on speech data. These results suggest that such models can capture sound-meaning connections akin to human language processing, providing insights into both cognitive architectures and machine learning strategies.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes