CLAIMay 31, 2025

Scaling Textual Gradients via Sampling-Based Momentum

arXiv:2506.00400v31 citationsh-index: 7
Originality Incremental advance
AI Analysis

This addresses scalability and stability issues in automatic prompt engineering for LLM users, though it is incremental as it builds on existing frameworks like TextGrad.

The paper tackles the problem of scaling training data in LLM-based prompt optimization, showing that naive scaling is infeasible due to context-length limits and degradation, and proposes TSGD-M with momentum sampling to achieve consistent gains across 5 benchmarks.

LLM-based prompt optimization, that uses LLM-provided "textual gradients" (feedback) to refine prompts, has emerged an effective method for automatic prompt engineering. However, its scalability and stability are unclear when using more data in training. We systematically investigate the potential and challenges of scaling training data in textual gradient descent. We show that naively scaling training examples is infeasible due to both explicit context-length limits and an implicit context wall, where long-context degradation yields diminishing returns. Inspired by prior wisdom in stochastic gradient descent, we propose Textual Stochastic Gradient Descent with Momentum (TSGD-M), which reweights updates through momentum sampling, using bootstrapped minibatch validation accuracy as importance weights over historical prompts. We introduce Gumbel-Top-$k$ sampling for prompt generation, balancing exploration--exploitation and improving sampling efficiency while maintaining a low-variance running mean estimator. TSGD-M integrates seamlessly into existing prompt optimization frameworks, including TextGrad, DSPy-COPRO, and AdalFlow, and achieves consistent gains across 5 benchmarks.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes