The Scaling Properties of Implicit Deductive Reasoning in Transformers
For researchers in reasoning and Transformers, this work provides insights into the scaling properties of implicit reasoning, though it is incremental as it confirms known limitations of implicit methods.
The paper studies how Transformers perform implicit deductive reasoning over Horn clauses, finding that deep models with bidirectional masks approach chain-of-thought performance across graph topologies and widths, but chain-of-thought is still needed for depth extrapolation.
We investigate the scaling properties of implicit deductive reasoning over Horn clauses in depth-bounded Transformers. By systematically decorrelating provability from spurious features and enforcing algorithmic alignment, we find that in sufficiently deep models with a bidirectional prefix mask, implicit reasoning approaches explicit CoT performance across graph topologies and problem widths, though CoT remains necessary for depth extrapolation.