Exploring Solution Divergence and Its Effect on Large Language Model Problem Solving
This addresses the challenge of enhancing LLM training and evaluation for problem-solving tasks, offering a simple tool, though it appears incremental as it builds on existing SFT and RL methods.
The paper tackled the problem of improving large language model (LLM) problem-solving by studying solution divergence, showing that higher divergence correlates with better abilities, and found that using it as a metric consistently improves success rates across three domains.
Large language models (LLMs) have been widely used for problem-solving tasks. Most recent work improves their performance through supervised fine-tuning (SFT) with labeled data or reinforcement learning (RL) from task feedback. In this paper, we study a new perspective: the divergence in solutions generated by LLMs for a single problem. We show that higher solution divergence is positively related to better problem-solving abilities across various models. Based on this finding, we propose solution divergence as a novel metric that can support both SFT and RL strategies. We test this idea on three representative problem domains and find that using solution divergence consistently improves success rates. These results suggest that solution divergence is a simple but effective tool for advancing LLM training and evaluation.