AIApr 1, 2025

Do We Truly Need So Many Samples? Multi-LLM Repeated Sampling Efficiently Scales Test-Time Compute

arXiv:2504.00762v419 citationsh-index: 17
Originality Incremental advance
AI Analysis

This work addresses the challenge of efficiently improving LLM performance for users needing cost-effective inference, though it is incremental as it builds on existing repeated-sampling frameworks.

The paper tackles the problem of scaling test-time compute for LLMs by proposing a multi-LLM repeated sampling strategy that uses consistency to dynamically switch between models, achieving performance gains over self-consistency and multi-agent debate approaches while reducing inference costs.

This paper presents a simple, effective, and cost-efficient strategy to improve LLM performance by scaling test-time compute. Our strategy builds upon the repeated-sampling-then-voting framework, with a novel twist: incorporating multiple models, even weaker ones, to leverage their complementary strengths that potentially arise from diverse training data and paradigms. By using consistency as a signal, our strategy dynamically switches between models. Theoretical analysis highlights the efficiency and performance advantages of our strategy. Extensive experiments on six datasets demonstrate that our strategy not only outperforms self-consistency and state-of-the-art multi-agent debate approaches, but also significantly reduces inference costs. Additionally, ModelSwitch requires only a few comparable LLMs to achieve optimal performance and can be extended with verification methods, demonstrating the potential of leveraging multiple LLMs in the generation-verification paradigm.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes