Reasoning Effort and Problem Complexity: A Scaling Analysis in LLMs
This work addresses a critical limitation in LLMs' reasoning scalability for complex logical problems, which is incremental as it builds on existing analysis of model performance.
The study investigated how reasoning effort in Large Language Models scales with problem complexity using the Tents puzzle, finding that effort increases only up to a critical threshold and then plateaus or decreases, revealing limitations in logical coherence.
Large Language Models (LLMs) have demonstrated remarkable text generation capabilities, and recent advances in training paradigms have led to breakthroughs in their reasoning performance. In this work, we investigate how the reasoning effort of such models scales with problem complexity. We use the infinitely scalable Tents puzzle, which has a known linear-time solution, to analyze this scaling behavior. Our results show that reasoning effort scales with problem size, but only up to a critical problem complexity. Beyond this threshold, the reasoning effort does not continue to increase, and may even decrease. This observation highlights a critical limitation in the logical coherence of current LLMs as problem complexity increases, and underscores the need for strategies to improve reasoning scalability. Furthermore, our results reveal significant performance differences between current state-of-the-art reasoning models when faced with increasingly complex logical puzzles.