How Chain-of-Thought Works? Tracing Information Flow from Decoding, Projection, and Activation
This provides mechanistic interpretability for CoT, aiding in designing more efficient prompts, but it is incremental as it builds on existing CoT research.
The paper tackles the problem of understanding the internal mechanisms of Chain-of-Thought (CoT) prompting by tracing information flow, finding that CoT acts as a decoding space pruner with higher template adherence correlating with improved performance and modulates neuron activation in task-dependent ways.
Chain-of-Thought (CoT) prompting significantly enhances model reasoning, yet its internal mechanisms remain poorly understood. We analyze CoT's operational principles by reversely tracing information flow across decoding, projection, and activation phases. Our quantitative analysis suggests that CoT may serve as a decoding space pruner, leveraging answer templates to guide output generation, with higher template adherence strongly correlating with improved performance. Furthermore, we surprisingly find that CoT modulates neuron engagement in a task-dependent manner: reducing neuron activation in open-domain tasks, yet increasing it in closed-domain scenarios. These findings offer a novel mechanistic interpretability framework and critical insights for enabling targeted CoT interventions to design more efficient and robust prompts. We released our code and data at https://anonymous.4open.science/r/cot-D247.