AIApr 1, 2025

Do We Truly Need So Many Samples? Multi-LLM Repeated Sampling Efficiently Scales Test-Time Compute

Jianhao Chen, Zishuo Xun, Bocheng Zhou, Han Qi, Hangfan Zhang, Qiaosheng Zhang, Yang Chen, Wei Hu, Yuzhong Qu, Wanli Ouyang, Shuyue Hu

arXiv:2504.00762v422.919 citationsh-index: 17Has Code

Originality Incremental advance

AI Analysis

This work addresses the challenge of efficiently improving LLM performance for users needing cost-effective inference, though it is incremental as it builds on existing repeated-sampling frameworks.

The paper tackles the problem of scaling test-time compute for LLMs by proposing a multi-LLM repeated sampling strategy that uses consistency to dynamically switch between models, achieving performance gains over self-consistency and multi-agent debate approaches while reducing inference costs.

This paper presents a simple, effective, and cost-efficient strategy to improve LLM performance by scaling test-time compute. Our strategy builds upon the repeated-sampling-then-voting framework, with a novel twist: incorporating multiple models, even weaker ones, to leverage their complementary strengths that potentially arise from diverse training data and paradigms. By using consistency as a signal, our strategy dynamically switches between models. Theoretical analysis highlights the efficiency and performance advantages of our strategy. Extensive experiments on six datasets demonstrate that our strategy not only outperforms self-consistency and state-of-the-art multi-agent debate approaches, but also significantly reduces inference costs. Additionally, ModelSwitch requires only a few comparable LLMs to achieve optimal performance and can be extended with verification methods, demonstrating the potential of leveraging multiple LLMs in the generation-verification paradigm.

View on arXiv PDF Code

Similar