CLDLAug 27, 2024

LyCon: Lyrics Reconstruction from the Bag-of-Words Using Large Language Models

arXiv:2408.14750v11 citationsh-index: 4
Originality Synthesis-oriented
AI Analysis

This enables academic research in lyric studies by providing a copyright-free dataset for experiments like conditional lyric generation, though it is incremental as it adapts existing methods to a specific domain.

The paper tackles the challenge of copyright restrictions on lyrics by reconstructing copyright-free lyrics from Bag-of-Words datasets using metadata and large language models, resulting in the LyCon dataset aligned with sources like the Million Song Dataset.

This paper addresses the unique challenge of conducting research in lyric studies, where direct use of lyrics is often restricted due to copyright concerns. Unlike typical data, internet-sourced lyrics are frequently protected under copyright law, necessitating alternative approaches. Our study introduces a novel method for generating copyright-free lyrics from publicly available Bag-of-Words (BoW) datasets, which contain the vocabulary of lyrics but not the lyrics themselves. Utilizing metadata associated with BoW datasets and large language models, we successfully reconstructed lyrics. We have compiled and made available a dataset of reconstructed lyrics, LyCon, aligned with metadata from renowned sources including the Million Song Dataset, Deezer Mood Detection Dataset, and AllMusic Genre Dataset, available for public access. We believe that the integration of metadata such as mood annotations or genres enables a variety of academic experiments on lyrics, such as conditional lyric generation.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes