Path-Consistency with Prefix Enhancement for Efficient Inference in LLMs
This addresses the problem of slow and expensive inference for users of LLMs in reasoning applications, offering an incremental improvement over existing self-consistency techniques.
The paper tackles the computational inefficiency of self-consistency methods in large language models by introducing path-consistency, which uses confidence in early answers to guide generation, reducing inference latency by up to 40.5% while maintaining accuracy across reasoning tasks.
To enhance the reasoning capabilities of large language models (LLMs), self-consistency has become a popular approach, combining multiple samplings with majority voting. However, current methods are computationally expensive and time-consuming due to the need for numerous samplings. To address this, this paper introduces path-consistency, which leverages the confidence of earlier-generated answers to identify the most promising prefix and guide the generation of subsequent branches. By dynamically guiding the generation of subsequent branches based on this prefix, path-consistency mitigates both the errors and redundancies from random or less useful sampling in self-consistency. This approach reduces errors and redundancies from random sampling, significantly accelerating inference by minimizing token consumption. Our extensive empirical results demonstrate that path-consistency improves inference latency by up to 40.5\%, while maintaining task accuracy across various tasks, including mathematical reasoning, commonsense reasoning, and symbolic reasoning.