CVMMSDASMay 10, 2022

Learning Visual Styles from Audio-Visual Associations

arXiv:2205.05072v126 citationsh-index: 26
Originality Incremental advance
AI Analysis

This addresses the problem of intuitive image manipulation for users by using audio as a representation, though it is incremental as it builds on existing audio-visual association methods.

The paper tackles the problem of learning visual styles from unlabeled audio-visual data, achieving results where their sound-based model outperforms label-based approaches in evaluations.

From the patter of rain to the crunch of snow, the sounds we hear often convey the visual textures that appear within a scene. In this paper, we present a method for learning visual styles from unlabeled audio-visual data. Our model learns to manipulate the texture of a scene to match a sound, a problem we term audio-driven image stylization. Given a dataset of paired audio-visual data, we learn to modify input images such that, after manipulation, they are more likely to co-occur with a given input sound. In quantitative and qualitative evaluations, our sound-based model outperforms label-based approaches. We also show that audio can be an intuitive representation for manipulating images, as adjusting a sound's volume or mixing two sounds together results in predictable changes to visual style. Project webpage: https://tinglok.netlify.app/files/avstyle

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes