CLAug 20, 2025

The Digital Sous Chef -- A Comparative Study on Fine-Tuning Language Models for Recipe Generation

arXiv:2508.14718v1
Originality Incremental advance
AI Analysis

This work addresses recipe generation for natural language processing applications, but it is incremental as it builds on existing models with domain-specific enhancements.

The study tackled recipe generation by comparing fine-tuned GPT-2 models against baselines, achieving a >20% improvement in BERTScore and a 69.8% reduction in perplexity.

We established a rigorous benchmark for text-based recipe generation, a fundamental task in natural language generation. We present a comprehensive comparative study contrasting a fine-tuned GPT-2 large (774M) model against the GPT-2 small (124M) model and traditional LSTM/RNN baselines on the 5-cuisine corpus from RecipeDB. Our key contribution is a targeted tokenization strategy that augments the vocabulary with 23 common fraction tokens and custom structural markers. This approach addresses a critical limitation of generic tokenizers by preserving essential recipe structures and precise numerical quantities, thereby enhancing domain specificity. Performance is evaluated using a comprehensive suite of seven automatic metrics spanning fluency (BLEU-4, METEOR), coherence (ROUGE-L), semantic relevance (BERTScore), and diversity. Our experiments show that the large transformer-based approach yields a >20% relative improvement in BERTScore (F1) (0.92 vs 0.72) over the best recurrent baseline, while reducing perplexity by 69.8%. We conclude with a discussion of remaining challenges, particularly regarding factual accuracy, and outline how this foundational study paves the way for integrating real-world constraints and multi-modal inputs in advanced recipe generation research.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes