CLLGDec 5, 2024

Understanding Hidden Computations in Chain-of-Thought Reasoning

arXiv:2412.04537v15 citationsh-index: 1
Originality Incremental advance
AI Analysis

This work addresses interpretability and transparency issues in language model reasoning for researchers and practitioners, but it is incremental as it builds on existing CoT and analysis techniques.

The paper tackled the problem of understanding how transformer models internally process reasoning steps when Chain-of-Thought prompts are replaced with hidden characters, and found that these hidden characters can be recovered without performance loss using methods like the logit lens.

Chain-of-Thought (CoT) prompting has significantly enhanced the reasoning abilities of large language models. However, recent studies have shown that models can still perform complex reasoning tasks even when the CoT is replaced with filler(hidden) characters (e.g., "..."), leaving open questions about how models internally process and represent reasoning steps. In this paper, we investigate methods to decode these hidden characters in transformer models trained with filler CoT sequences. By analyzing layer-wise representations using the logit lens method and examining token rankings, we demonstrate that the hidden characters can be recovered without loss of performance. Our findings provide insights into the internal mechanisms of transformer models and open avenues for improving interpretability and transparency in language model reasoning.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes