LGFeb 26

Semantic Tube Prediction: Beating LLM Data Efficiency with JEPA

arXiv:2602.22617v15 citationsh-index: 20Has Code
Originality Highly original
AI Analysis

This work significantly improves data efficiency for training large language models, which is a critical bottleneck for researchers and practitioners facing high computational costs and data requirements.

This paper introduces the Semantic Tube Prediction (STP) task, a JEPA-style regularizer for LLMs, based on the Geodesic Hypothesis that token sequences are locally linear on a semantic manifold. STP allows LLMs to achieve baseline accuracy with 16 times less training data on the NL-RX-SYNTH dataset, directly challenging established LLM scaling laws.

Large Language Models (LLMs) obey consistent scaling laws -- empirical power-law fits that predict how loss decreases with compute, data, and parameters. While predictive, these laws are descriptive rather than prescriptive: they characterize typical training, not optimal training. Surprisingly few works have successfully challenged the data-efficiency bounds implied by these laws -- which is our primary focus. To that end, we introduce the Geodesic Hypothesis, positing that token sequences trace geodesics on a smooth semantic manifold and are therefore locally linear. Building on this principle, we propose a novel Semantic Tube Prediction (STP) task, a JEPA-style regularizer that confines hidden-state trajectories to a tubular neighborhood of the geodesic. STP generalizes JEPA to language without requiring explicit multi-view augmentations. We show this constraint improves signal-to-noise ratio, and consequently preserves diversity by preventing trajectory collisions during inference. Empirically, STP allows LLMs to match baseline accuracy with 16$\times$ less training data on the NL-RX-SYNTH dataset, directly violating the data term of Chinchilla-style scaling laws and demonstrating that principled geometric priors can surpass brute-force scaling. Code is available at https://github.com/galilai-group/llm-jepa#stp.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes