LGMar 17

Transformers Can Learn Rules They've Never Seen: Proof of Computation Beyond Interpolation

arXiv:2603.1701933.7h-index: 2

AI Analysis

This addresses a central debate in LLMs about rule generalization, showing transformers can go beyond interpolation, which is significant for AI researchers but incremental in scope.

The paper tackles whether transformers can infer rules absent from training, testing a strong interpolation-only hypothesis in controlled settings. In experiments with cellular automata and symbolic operator chains, transformers achieved high accuracy (e.g., up to 100% and 78.6%) and outperformed interpolation baselines, providing an existence proof that they can learn and express unseen rules.

A central question in the LLM debate is whether transformers can infer rules absent from training, or whether apparent generalisation reduces to similarity-based interpolation over observed examples. We test a strong interpolation-only hypothesis in two controlled settings: one where interpolation is ruled out by construction and proof, and one where success requires emitting intermediate symbolic derivations rather than only final answers. In Experiment 1, we use a cellular automaton with a pure XOR transition rule and remove specific local input patterns from training; since XOR is linearly inseparable, each held-out pattern's nearest neighbours have the opposite label, so similarity-based predictors fail on the held-out region. Yet a two-layer transformer recovers the rule (best 100%; 47/60 converged runs), and circuit extraction identifies XOR computation. Performance depends on multi-step constraint propagation: without unrolling, accuracy matches output bias (63.1%), while soft unrolling reaches 96.7%. In Experiment 2, we study symbolic operator chains over integers with one operator pair held out; the model must emit intermediate steps and a final answer in a proof-like format. Across all 49 holdout pairs, the transformer exceeds every interpolation baseline (mean 41.8%, up to 78.6%; mean KRR 4.3%; KNN and MLP score 0% on every pair), while removing intermediate-step supervision degrades performance. Together with a construction showing that a standard transformer block can implement exact local Boolean rules, these results provide an existence proof that transformers can learn rule structure not directly observed in training and express it explicitly, ruling out the strongest architectural form of interpolation-only accounts: that transformers cannot in principle discover and communicate unseen rules, while leaving open when such behaviour arises in large-scale language training.

View on arXiv PDF

Similar