More than words: Advancements and challenges in speech recognition for singing
It addresses the problem of improving speech recognition for singing, which is important for applications in music analysis and accessibility, but is incremental as it reviews existing developments.
This paper tackles the challenges in speech recognition for singing, such as pitch variations and background music, and reviews advancements in tasks like phoneme recognition and lyrics transcription, noting progress driven by deep learning and large datasets.
This paper addresses the challenges and advancements in speech recognition for singing, a domain distinctly different from standard speech recognition. Singing encompasses unique challenges, including extensive pitch variations, diverse vocal styles, and background music interference. We explore key areas such as phoneme recognition, language identification in songs, keyword spotting, and full lyrics transcription. I will describe some of my own experiences when performing research on these tasks just as they were starting to gain traction, but will also show how recent developments in deep learning and large-scale datasets have propelled progress in this field. My goal is to illuminate the complexities of applying speech recognition to singing, evaluate current capabilities, and outline future research directions.