LG AIApr 9

Dead Weights, Live Signals: Feedforward Graphs of Frozen Language Models

Marcus Armstrong, Navid Ayoobi, Arjun Mukherjee

arXiv:2604.0833547.3

Predicted impact top 55% in LG · last 90 daysOriginality Incremental advance

AI Analysis

This work addresses the challenge of efficiently combining frozen LLMs for enhanced reasoning, offering a scalable approach with minimal trainable parameters, though it is incremental in extending prior geometric compatibility findings to trainable multi-node graphs.

The paper tackles the problem of leveraging multiple frozen large language models (LLMs) as computational nodes in a feedforward graph to improve reasoning performance without fine-tuning the models themselves, achieving gains of up to 11.4 percentage points on benchmarks like ARC-Challenge compared to single models.

We present a feedforward graph architecture in which heterogeneous frozen large language models serve as computational nodes, communicating through a shared continuous latent space via learned linear projections. Building on recent work demonstrating geometric compatibility between independently trained LLM latent spaces~\cite{armstrong2026thinking}, we extend this finding from static two-model steering to end-to-end trainable multi-node graphs, where projection matrices are optimized jointly via backpropagation through residual stream injection hooks. Three small frozen models (Llama-3.2-1B, Qwen2.5-1.5B, Gemma-2-2B) encode the input into a shared latent space whose aggregate signal is injected into two larger frozen models (Phi-3-mini, Mistral-7B), whose representations feed a lightweight cross-attention output node. With only 17.6M trainable parameters against approximately 12B frozen, the architecture achieves 87.3\% on ARC-Challenge, 82.8\% on OpenBookQA, and 67.2\% on MMLU, outperforming the best single constituent model by 11.4, 6.2, and 1.2 percentage points respectively, and outperforming parameter-matched learned classifiers on frozen single models by 9.1, 5.2, and 6.7 points. Gradient flow through multiple frozen model boundaries is empirically verified to be tractable, and the output node develops selective routing behavior across layer-2 nodes without explicit supervision.

View on arXiv PDF

Similar