LiteraryTaste: A Preference Dataset for Creative Writing Personalization
This work addresses the need for personalized creative writing technologies, though it is incremental as it focuses on dataset creation and basic modeling.
The authors tackled the problem of personalizing creative writing LLMs by introducing the LiteraryTaste dataset, which includes stated and revealed preferences from 60 people, and found that finetuning a transformer encoder achieved up to 75.8% accuracy in modeling personal preferences.
People have different creative writing preferences, and large language models (LLMs) for these tasks can benefit from adapting to each user's preferences. However, these models are often trained over a dataset that considers varying personal tastes as a monolith. To facilitate developing personalized creative writing LLMs, we introduce LiteraryTaste, a dataset of reading preferences from 60 people, where each person: 1) self-reported their reading habits and tastes (stated preference), and 2) annotated their preferences over 100 pairs of short creative writing texts (revealed preference). With our dataset, we found that: 1) people diverge on creative writing preferences, 2) finetuning a transformer encoder could achieve 75.8% and 67.7% accuracy when modeling personal and collective revealed preferences, and 3) stated preferences had limited utility in modeling revealed preferences. With an LLM-driven interpretability pipeline, we analyzed how people's preferences vary. We hope our work serves as a cornerstone for personalizing creative writing technologies.