Chain of Simulation: A Dual-Mode Reasoning Framework for Large Language Models with Dynamic Problem Routing
This addresses the challenge of enhancing LLM reasoning efficiency and accuracy for tasks like math and spatial reasoning without retraining, though it is incremental as it builds on existing prompting methods.
The paper tackles the problem of improving reasoning in large language models by introducing Chain of Simulation (CoS), a dual-mode framework that dynamically routes problems to specialized strategies, achieving absolute improvements of 1.0% on GSM8K, 2.5% on StrategyQA, and a 65.2% relative improvement on bAbI compared to baselines.
We present Chain of Simulation (CoS), a novel dual-mode reasoning framework that dynamically routes problems to specialized reasoning strategies in Large Language Models (LLMs). Unlike existing uniform prompting approaches, CoS employs three distinct reasoning modes: (1) computational flow with self-consistency for mathematical problems, (2) symbolic state tracking with JSON representations for spatial reasoning, and (3) hybrid fact-extraction for multi-hop inference. Through comprehensive evaluation on GSM8K, StrategyQA, and bAbI benchmarks using four state-of-the-art models (Gemma-3 27B, LLaMA-3.1 8B, Mistral 7B, and Qwen-2.5 14B), we demonstrate that CoS achieves 71.5% accuracy on GSM8K (1.0% absolute improvement), 90.0% on StrategyQA (2.5% improvement), and 19.0% on bAbI (65.2% relative improvement) compared to the strongest baselines. The analysis reveals that problem-specific mode selection is crucial, with computational mode achieving 81.2% accuracy when correctly applied to mathematical problems, while misrouting leads to 0% accuracy. We provide detailed algorithms for mode selection, state tracking, and answer extraction, establishing CoS as an effective approach for improving LLM reasoning without additional training. The framework provides superior trade-offs between accuracy and efficiency compared to Self-Consistency, achieving comparable performance at 54% lower computational cost.