CLJun 22, 2025

LLMs for Customized Marketing Content Generation and Evaluation at Scale

Haoran Liu, Amir Tahmasbi, Ehtesham Sam Haque, Purak Jain

arXiv:2506.17863v19 citationsh-index: 1

Originality Incremental advance

AI Analysis

This work addresses the need for scalable, effective marketing content generation and evaluation in e-commerce, though it is incremental in combining existing techniques like retrieval-augmentation and LLM-as-a-Judge.

The paper tackled the problem of generic offsite marketing content by proposing MarketingFM, a retrieval-augmented system for generating keyword-specific ad copy, which achieved up to 9% higher CTR and 12% more impressions in A/B tests. It also introduced AutoEval-Main for automated ad evaluation, achieving 89.57% agreement with human reviewers, and AutoEval-Update to refine evaluation prompts with minimal human input.

Offsite marketing is essential in e-commerce, enabling businesses to reach customers through external platforms and drive traffic to retail websites. However, most current offsite marketing content is overly generic, template-based, and poorly aligned with landing pages, limiting its effectiveness. To address these limitations, we propose MarketingFM, a retrieval-augmented system that integrates multiple data sources to generate keyword-specific ad copy with minimal human intervention. We validate MarketingFM via offline human and automated evaluations and large-scale online A/B tests. In one experiment, keyword-focused ad copy outperformed templates, achieving up to 9% higher CTR, 12% more impressions, and 0.38% lower CPC, demonstrating gains in ad ranking and cost efficiency. Despite these gains, human review of generated ads remains costly. To address this, we propose AutoEval-Main, an automated evaluation system that combines rule-based metrics with LLM-as-a-Judge techniques to ensure alignment with marketing principles. In experiments with large-scale human annotations, AutoEval-Main achieved 89.57% agreement with human reviewers. Building on this, we propose AutoEval-Update, a cost-efficient LLM-human collaborative framework to dynamically refine evaluation prompts and adapt to shifting criteria with minimal human input. By selectively sampling representative ads for human review and using a critic LLM to generate alignment reports, AutoEval-Update improves evaluation consistency while reducing manual effort. Experiments show the critic LLM suggests meaningful refinements, improving LLM-human agreement. Nonetheless, human oversight remains essential for setting thresholds and validating refinements before deployment.

View on arXiv PDF

Similar