AILGDec 15, 2025

Error-Driven Prompt Optimization for Arithmetic Reasoning

arXiv:2512.13323v1h-index: 6
Originality Incremental advance
AI Analysis

This enables privacy-compliant industrial AI assistants for regulated sectors like finance and healthcare, though it appears incremental as it builds on existing prompt optimization approaches.

The paper tackles the problem of improving arithmetic reasoning accuracy for small language models in secure on-premises environments, achieving 70.8% accuracy with an error-driven prompt optimization method that clusters erroneous predictions to refine prompt-rules iteratively.

Recent advancements in artificial intelligence have sparked interest in industrial agents capable of supporting analysts in regulated sectors, such as finance and healthcare, within tabular data workflows. A key capability for such systems is performing accurate arithmetic operations on structured data while ensuring sensitive information never leaves secure, on-premises environments. Here, we introduce an error-driven optimization framework for arithmetic reasoning that enhances a Code Generation Agent (CGA), specifically applied to on-premises small language models (SLMs). Through a systematic evaluation of a leading SLM (Qwen3 4B), we find that while the base model exhibits fundamental limitations in arithmetic tasks, our proposed error-driven method, which clusters erroneous predictions to refine prompt-rules iteratively, dramatically improves performance, elevating the model's accuracy to 70.8\%. Our results suggest that developing reliable, interpretable, and industrially deployable AI assistants can be achieved not only through costly fine-tuning but also via systematic, error-driven prompt optimization, enabling small models to surpass larger language models (GPT-3.5 Turbo) in a privacy-compliant manner.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes