Majority Voting for Code Generation
For practitioners using LLMs for code generation, FMV offers a practical, low-overhead method to improve inference-time performance, though its gains are limited to the base model's capabilities.
The paper investigates Functional Majority Voting (FMV) for code generation, which selects a representative solution from multiple LLM outputs based on runtime execution signatures. FMV substantially boosts performance on LiveCodeBench with low compute overhead, and when used as an aggregation strategy for test-time reinforcement learning, it increases pass@1 on holdout tasks but does not enable self-improvement beyond the base model's ceiling.
We investigate Functional Majority Voting (FMV), a method based on functional consensus for code generation with Large Language Models, which identifies a representative solution from multiple generations using their runtime execution signatures on test inputs. We find that FMV is an effective test-time inference strategy, substantially boosting performance on LiveCodeBench without a large compute overhead. Furthermore, we extend the utility of functional consensus and apply it as an aggregation strategy for label-free Test-Time Reinforcement Learning. We demonstrate that this increases pass@1 on holdout tasks, but find no evidence of self-improvement beyond the base model's performance ceiling.