LGAICLMay 30, 2025

Fine-Tune an SLM or Prompt an LLM? The Case of Generating Low-Code Workflows

arXiv:2505.24189v26 citationsh-index: 9
Originality Synthesis-oriented
AI Analysis

This work addresses a practical decision for developers and researchers in low-code automation, though it is incremental as it compares existing methods on a specific task.

The paper tackled the problem of whether to fine-tune small language models (SLMs) or prompt large language models (LLMs) for generating low-code workflows in JSON form, finding that fine-tuning SLMs improves quality by 10% on average compared to prompting LLMs.

Large Language Models (LLMs) such as GPT-4o can handle a wide range of complex tasks with the right prompt. As per token costs are reduced, the advantages of fine-tuning Small Language Models (SLMs) for real-world applications -- faster inference, lower costs -- may no longer be clear. In this work, we present evidence that, for domain-specific tasks that require structured outputs, SLMs still have a quality advantage. We compare fine-tuning an SLM against prompting LLMs on the task of generating low-code workflows in JSON form. We observe that while a good prompt can yield reasonable results, fine-tuning improves quality by 10% on average. We also perform systematic error analysis to reveal model limitations.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes