CL CV SD ASSep 18, 2024

Measuring Sound Symbolism in Audio-visual Models

Wei-Cheng Tseng, Yi-Jen Shih, David Harwath, Raymond Mooney

arXiv:2409.12306v32.72 citationsh-index: 16

Originality Incremental advance

AI Analysis

This addresses the problem of understanding cognitive-like associations in AI models for researchers in machine learning and cognitive science, though it is incremental in scope.

The study investigated whether pre-trained audio-visual models exhibit sound symbolism, finding a significant correlation between model outputs and established patterns, especially in speech-trained models.

Audio-visual pre-trained models have gained substantial attention recently and demonstrated superior performance on various audio-visual tasks. This study investigates whether pre-trained audio-visual models demonstrate non-arbitrary associations between sounds and visual representations$\unicode{x2013}$known as sound symbolism$\unicode{x2013}$which is also observed in humans. We developed a specialized dataset with synthesized images and audio samples and assessed these models using a non-parametric approach in a zero-shot setting. Our findings reveal a significant correlation between the models' outputs and established patterns of sound symbolism, particularly in models trained on speech data. These results suggest that such models can capture sound-meaning connections akin to human language processing, providing insights into both cognitive architectures and machine learning strategies.

View on arXiv PDF

Similar