CLJan 7

When Models Decide and When They Bind: A Two-Stage Computation for Multiple-Choice Question-Answering

arXiv:2601.03914v1h-index: 1
Originality Incremental advance
AI Analysis

This addresses the problem of disentangling reasoning from symbol-binding errors in MCQA for AI researchers, providing insights into model internals.

The study investigated how language models perform multiple-choice question answering, finding that they use a two-stage process: first selecting the correct answer in content space, then binding it to the output symbol, with option-boundary residual states containing decodable correctness signals.

Multiple-choice question answering (MCQA) is easy to evaluate but adds a meta-task: models must both solve the problem and output the symbol that *represents* the answer, conflating reasoning errors with symbol-binding failures. We study how language models implement MCQA internally using representational analyses (PCA, linear probes) as well as causal interventions. We find that option-boundary (newline) residual states often contain strong linearly decodable signals related to per-option correctness. Winner-identity probing reveals a two-stage progression: the winning *content position* becomes decodable immediately after the final option is processed, while the *output symbol* is represented closer to the answer emission position. Tests under symbol and content permutations support a two-stage mechanism in which models first select a winner in content space and then bind or route that winner to the appropriate symbol to emit.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes