HCCLLGDec 15, 2019

Breaking Speech Recognizers to Imagine Lyrics

arXiv:1912.06979v11 citations
Originality Incremental advance
AI Analysis

This is an incremental approach for creative applications like music and art, enabling machine-in-the-loop collaboration.

The paper tackles the problem of generating text, specifically song lyrics, from audio by repurposing a vocal source separation algorithm and an acoustic speech recognizer to transcribe imagined words from instrumental music or environmental sounds, presenting initial analysis and discussing potential for creative applications.

We introduce a new method for generating text, and in particular song lyrics, based on the speech-like acoustic qualities of a given audio file. We repurpose a vocal source separation algorithm and an acoustic model trained to recognize isolated speech, instead inputting instrumental music or environmental sounds. Feeding the "mistakes" of the vocal separator into the recognizer, we obtain a transcription of words \emph{imagined} to be spoken in the input audio. We describe the key components of our approach, present initial analysis, and discuss the potential of the method for machine-in-the-loop collaboration in creative applications.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes