JaccDiv: A Metric and Benchmark for Quantifying Diversity of Generated Marketing Text in the Music Industry
This work addresses the issue of monotonous automated content generation for online platforms, particularly in the music industry, but is incremental as it builds on existing LLM methods.
The paper tackled the problem of repetitive patterns in LLM-based data-to-text generation for marketing, introducing the JaccDiv metric to quantify text diversity and setting baselines using models like T5 and GPT-4, with results showing improved diversity in generated texts.
Online platforms are increasingly interested in using Data-to-Text technologies to generate content and help their users. Unfortunately, traditional generative methods often fall into repetitive patterns, resulting in monotonous galleries of texts after only a few iterations. In this paper, we investigate LLM-based data-to-text approaches to automatically generate marketing texts that are of sufficient quality and diverse enough for broad adoption. We leverage Language Models such as T5, GPT-3.5, GPT-4, and LLaMa2 in conjunction with fine-tuning, few-shot, and zero-shot approaches to set a baseline for diverse marketing texts. We also introduce a metric JaccDiv to evaluate the diversity of a set of texts. This research extends its relevance beyond the music industry, proving beneficial in various fields where repetitive automated content generation is prevalent.