LGApr 25, 2025

Modes of Sequence Models and Learning Coefficients

arXiv:2504.18048v13 citationsh-index: 7
Originality Incremental advance
AI Analysis

This work provides a geometric framework for analyzing sequence models, offering insights into training dynamics and landscape structure, but it is incremental as it builds on existing theories of learning coefficients and tensor decompositions.

The paper tackles the problem of understanding loss landscape geometry in transformer networks by linking data patterns to measurable properties, showing that Local Learning Coefficient estimates characterize an effective distribution rather than the true one, which clarifies reliable estimation even without strict loss minimization.

We develop a geometric account of sequence modelling that links patterns in the data to measurable properties of the loss landscape in transformer networks. First, we cast conditional sequence distributions into a Hilbert-space framework and apply tensor decompositions to identify their principal modes. Truncating the small-amplitude modes yields an effective data distribution that preserves dominant structure while discarding statistical detail. Second, we show theoretically that Local Learning Coefficient (LLC) estimates are insensitive to modes below a data-dependent threshold. Consequently, the LLC calculated in practice characterises the geometry of the effective rather than the true distribution. This insight clarifies why reliable LLC estimates can be obtained even when a network parameter is not a strict minimiser of the population loss, and it highlights how the inverse temperature in SGLD acts as a resolution dial on the landscape structure.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes