SDLGASFeb 9, 2021

TräumerAI: Dreaming Music with StyleGAN

arXiv:2102.04680v120 citations
Originality Incremental advance
AI Analysis

This work provides a method for artists and content creators to generate visually appealing music videos that reflect musical characteristics, addressing the challenge of subjective mapping between audio and visual semantics.

This paper introduces TräumerAI, a neural music visualizer that maps deep music embeddings to StyleGAN style embeddings to create videos that visually respond to music. The system uses a music auto-tagging model and a StyleGAN2 pre-trained on WikiArt, with a transfer function trained on 100 manually labeled 10-second music clips paired with selected StyleGAN images.

The goal of this paper to generate a visually appealing video that responds to music with a neural network so that each frame of the video reflects the musical characteristics of the corresponding audio clip. To achieve the goal, we propose a neural music visualizer directly mapping deep music embeddings to style embeddings of StyleGAN, named TräumerAI, which consists of a music auto-tagging model using short-chunk CNN and StyleGAN2 pre-trained on WikiArt dataset. Rather than establishing an objective metric between musical and visual semantics, we manually labeled the pairs in a subjective manner. An annotator listened to 100 music clips of 10 seconds long and selected an image that suits the music among the 200 StyleGAN-generated examples. Based on the collected data, we trained a simple transfer function that converts an audio embedding to a style embedding. The generated examples show that the mapping between audio and video makes a certain level of intra-segment similarity and inter-segment dissimilarity.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes