Inference-Time Code Selection via Symbolic Equivalence Partitioning
This addresses the need for reliable, cost-effective inference-time selection in code generation for developers and researchers, offering a novel method that reduces dependency on expensive verifiers.
The paper tackles the problem of selecting correct code solutions from LLM-generated candidates by proposing Symbolic Equivalence Partitioning, which uses symbolic execution and SMT constraints to group programs by semantic behavior and select a representative, improving average accuracy from 0.728 to 0.803 on HumanEval+ and from 0.516 to 0.604 on LiveCodeBench at N=10.
"Best-of-N" selection is a popular inference-time scaling method for code generation using Large Language Models (LLMs). However, to reliably identify correct solutions, existing methods often depend on expensive or stochastic external verifiers. In this paper, we propose Symbolic Equivalence Partitioning, a selection framework that uses symbolic execution to group candidate programs by semantic behavior and select a representative from the dominant functional partition. To improve grouping and selection, we encode domain-specific constraints as Satisfiability Modulo Theories (SMT) assumptions during symbolic execution to reduce path explosion and prevent invalid input searches outside the problem domain. At N=10, our method improves average accuracy over Pass@1 from 0.728 to 0.803 on HumanEval+ and from 0.516 to 0.604 on LiveCodeBench, without requiring any additional LLM inference beyond the initial N candidate generations.