In BLOOM: Creativity and Affinity in Artificial Lyrics and Art
This work addresses the challenge of evaluating creative outputs from large language models for artists and researchers, though it is incremental in combining existing models for lyric and art generation.
The study applied the BLOOM-176B model to generate Chinese song lyrics, finding that human reviewers rated some machine-generated lyrics as more creative than real ones, while highlighting limitations in computational metrics like MAUVE for evaluating creativity.
We apply a large multilingual language model (BLOOM-176B) in open-ended generation of Chinese song lyrics, and evaluate the resulting lyrics for coherence and creativity using human reviewers. We find that current computational metrics for evaluating large language model outputs (MAUVE) have limitations in evaluation of creative writing. We note that the human concept of creativity requires lyrics to be both comprehensible and distinctive -- and that humans assess certain types of machine-generated lyrics to score more highly than real lyrics by popular artists. Inspired by the inherently multimodal nature of album releases, we leverage a Chinese-language stable diffusion model to produce high-quality lyric-guided album art, demonstrating a creative approach for an artist seeking inspiration for an album or single. Finally, we introduce the MojimLyrics dataset, a Chinese-language dataset of popular song lyrics for future research.