AIApr 22

Adaptive Test-Time Compute Allocation with Evolving In-Context Demonstrations

arXiv:2604.2101883.61 citationsh-index: 1

Predicted impact top 29% in AI · last 90 daysOriginality Incremental advance

AI Analysis

For practitioners deploying large language models, this method improves performance-efficiency trade-offs at inference time without requiring additional training data.

The paper introduces a test-time compute allocation framework that adaptively focuses computation on hard queries and reshapes generation distributions using evolving in-context demonstrations from the test set, outperforming baselines while using less compute.

While scaling test-time compute can substantially improve model performance, existing approaches either rely on static compute allocation or sample from fixed generation distributions. In this work, we introduce a test-time compute allocation framework that jointly adapts where computation is spent and how generation is performed. Our method begins with a warm-up phase that identifies easy queries and assembles an initial pool of question-response pairs from the test set itself. An adaptive phase then concentrates further computation on unresolved queries while reshaping their generation distributions through evolving in-context demonstrations -- conditioning each generation on successful responses from semantically related queries rather than resampling from a fixed distribution. Experiments across math, coding, and reasoning benchmarks demonstrate that our approach consistently outperforms existing baselines while consuming substantially less inference-time compute.

View on arXiv PDF

Similar