DiversiGATE: A Comprehensive Framework for Reliable Large Language Models
This addresses reliability issues in large language models for users requiring trustworthy outputs, though it appears incremental as it builds upon existing verification approaches.
The paper tackles the problem of verifying large language models by introducing DiversiGATE, a framework that consolidates existing verification methods and includes a novel SelfLearner model that learns from its own outputs. The approach improves accuracy on arithmetic reasoning benchmarks, achieving a 7% absolute improvement (54.8% to 61.8%) on GSM8K.
In this paper, we introduce DiversiGATE, a unified framework that consolidates diverse methodologies for LLM verification. The proposed framework comprises two main components: Diversification and Aggregation which provide a holistic perspective on existing verification approaches, such as Self-Consistency, Math Prompter and WebGPT. Furthermore, we propose a novel `SelfLearner' model that conforms to the DiversiGATE framework which can learn from its own outputs and refine its performance over time, leading to improved accuracy. To evaluate the effectiveness of SelfLearner, we conducted a rigorous series of experiments, including tests on synthetic data as well as on popular arithmetic reasoning benchmarks such as GSM8K. Our results demonstrate that our approach outperforms traditional LLMs, achieving a considerable 54.8% -> 61.8% improvement on the GSM8K benchmark.