LG AIFeb 20

Turbo Connection: Reasoning as Information Flow from Higher to Lower Layers

arXiv:2602.17993v11.4h-index: 17

Originality Highly original

AI Analysis

This addresses the problem of enhancing reasoning ability in large language models for tasks like math and logic, offering a novel architectural modification without full retraining.

The paper tackled the limitation of Transformers in reasoning due to fixed computational depth by introducing TurboConn, an architecture that routes connections from higher to lower layers between tokens, achieving accuracy gains of 0.9% to over 10% on benchmarks like GSM8K and enabling a model to reach 100% accuracy on Parity.

Complex problems, whether in math, logic, or planning, are solved by humans through a sequence of steps where the result of one step informs the next. In this work, we adopt the perspective that the reasoning power of Transformers is fundamentally limited by a fixed maximum number of steps along any latent path of computation. To address this, we introduce Turbo Connection (TurboConn), a novel architecture that overcomes the fixed-depth constraint by routing multiple residual connections from the higher-layer hidden states of each token $t$ to the lower layers of token $t+1$. Fine-tuning pre-trained LLMs with our method not only yields accuracy gains of 0.9% to over 10% on benchmarks like GSM8K, Parity, and multi-step arithmetic, but also demonstrates that the density of these backward connections is critical; our dense interaction significantly outperforms "sparse" alternatives that only pass a single hidden state or vector. Notably, TurboConn can be integrated into pre-trained LLMs to overcome task-specific plateaus: while a fine-tuned Qwen-3-1.7B achieves only 53.78% on Parity, adding our architectural modification enables the model to reach 100% accuracy, all without the necessity to retrain the full model from scratch or sophisticated curriculum learning. Our results provide strong empirical evidence that the depth of the computational path is a key factor in reasoning ability, also offering a new mechanism to enhance LLMs without significantly affecting generation latency.

View on arXiv PDF

Similar