Quantum-enhanced Large Language Models on Quantum Hardware via Cayley Unitary Adapters
This work provides the first practical demonstration of quantum-enhanced LLMs on real hardware, addressing the memory bottleneck of classical LLMs for the quantum computing community.
The authors demonstrate that Cayley-parameterised unitary adapters, executed on a 156-qubit IBM Quantum System Two processor, improve the perplexity of Llama 3.1 8B by 1.4% with only 6,000 additional parameters, and show that quantum-enhanced LLMs can answer questions that classical baselines fail, with a noise-expressivity phase transition indicating a path to quantum utility.
Large language models (LLMs) have transformed artificial intelligence, yet classical architectures impose a fundamental constraint: every trainable parameter demands classical memory that scales unfavourably with model size. Quantum computing offers a qualitatively different pathway, but practical demonstrations on real hardware have remained elusive for models of practical relevance. Here we show that Cayley-parameterised unitary adapters -- quantum circuit blocks inserted into the frozen projection layers of pre-trained LLMs and executed on a 156-qubit IBM Quantum System Two superconducting processor -- improve the perplexity of Llama 3.1 8B, an 8-billion-parameter model in widespread use, by 1.4% with only 6,000 additional parameters and end-to-end inference validated on real Quantum Processing Unit (QPU). A systematic study on SmolLM2 (135M parameters), chosen for its tractability, reveals monotonically improving perplexity with unitary block dimension, 83% recovery of compression-induced degradation, and correct answers to questions that both classical baselines fail -- with a sharp noise-expressivity phase transition identifying the concrete path to quantum utility at larger qubit scales.