AICLLGFeb 1

Error Taxonomy-Guided Prompt Optimization

arXiv:2602.00997v1
Originality Incremental advance
AI Analysis

This addresses the compute-intensive nature of prompt optimization for large language models, offering a more efficient solution for researchers and practitioners, though it is incremental as it builds on existing feedback-based methods.

The paper tackled the problem of inefficient automatic prompt optimization by proposing a top-down approach that categorizes model errors into a taxonomy to guide prompt improvements, achieving comparable or better accuracy than state-of-the-art methods while reducing optimization-phase token usage and evaluation budget by about one third.

Automatic Prompt Optimization (APO) is a powerful approach for extracting performance from large language models without modifying their weights. Many existing methods rely on trial-and-error, testing different prompts or in-context examples until a good configuration emerges, often consuming substantial compute. Recently, natural language feedback derived from execution logs has shown promise as a way to identify how prompts can be improved. However, most prior approaches operate in a bottom-up manner, iteratively adjusting the prompt based on feedback from individual problems, which can cause them to lose the global perspective. In this work, we propose Error Taxonomy-Guided Prompt Optimization (ETGPO), a prompt optimization algorithm that adopts a top-down approach. ETGPO focuses on the global failure landscape by collecting model errors, categorizing them into a taxonomy, and augmenting the prompt with guidance targeting the most frequent failure modes. Across multiple benchmarks spanning mathematics, question answering, and logical reasoning, ETGPO achieves accuracy that is comparable to or better than state-of-the-art methods, while requiring roughly one third of the optimization-phase token usage and evaluation budget.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes