LGAISep 12, 2025

Why and How Auxiliary Tasks Improve JEPA Representations

arXiv:2509.12249v22 citationsh-index: 85
Originality Incremental advance
AI Analysis

This work provides theoretical insights for improving JEPA encoders in visual representation learning and model-based RL, though it is incremental as it builds on existing JEPA frameworks.

The paper tackles the problem of understanding how auxiliary tasks improve representations in Joint-Embedding Predictive Architectures (JEPA) by proving a theorem that ensures distinct latent representations for non-equivalent observations when both latent-transition consistency and auxiliary regression losses are minimized, with controlled ablations in a counting environment showing richer representations from joint training.

Joint-Embedding Predictive Architecture (JEPA) is increasingly used for visual representation learning and as a component in model-based RL, but its behavior remains poorly understood. We provide a theoretical characterization of a simple, practical JEPA variant that has an auxiliary regression head trained jointly with latent dynamics. We prove a No Unhealthy Representation Collapse theorem: in deterministic MDPs, if training drives both the latent-transition consistency loss and the auxiliary regression loss to zero, then any pair of non-equivalent observations, i.e., those that do not have the same transition dynamics or auxiliary value, must map to distinct latent representations. Thus, the auxiliary task anchors which distinctions the representation must preserve. Controlled ablations in a counting environment corroborate the theory and show that training the JEPA model jointly with the auxiliary head generates a richer representation than training them separately. Our work indicates a path to improve JEPA encoders: training them with an auxiliary function that, together with the transition dynamics, encodes the right equivalence relations.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes