LGMar 22

Stream separation improves Bregman conditioning in transformers

arXiv:2603.2131741.9h-index: 7
Predicted impact top 58% in LG · last 90 daysOriginality Incremental advance
AI Analysis

This work addresses a foundational issue for researchers and practitioners using linear safety interventions in transformers, though it is incremental as it builds on prior analysis of Bregman geometry.

The paper tackled the problem of Euclidean geometry assumptions in linear steering methods for transformers by measuring the Bregman geometry Hessian at intermediate layers, finding that stream separation improves conditioning by up to 22 in effective rank. This result impacts the reliability of safety interventions dependent on well-conditioned geometry.

Linear methods for steering transformer representations, including probing, activation engineering, and concept erasure, implicitly assume the geometry of representation space is Euclidean. Park et al. [Park et al., 2026] showed that softmax induces a curved Bregman geometry whose metric tensor is the Hessian of the log-normalizer, $H(λ) = Cov[γ | λ]$. Ignoring this curvature causes Euclidean steering to leak probability mass to unintended tokens. Their analysis applies at the output layer. We measure this Hessian at intermediate layers in a controlled 2x2 design crossing stream separation with per-layer supervision (vocabulary decoding loss at each layer), all at matched vocabulary and parameter count. In standard single-stream transformers, H is severely degenerate at intermediate layers (effective rank 8 in 516 dimensions). Stream separation improves conditioning by up to 22 in effective rank, even without auxiliary supervision. Per-layer supervision helps, but less. The cosine similarity between primal and dual concept directions predicts per-layer steering effectiveness on downstream tasks, with a threshold near 0.3. These results bear on the reliability of linear safety interventions, which depend on the geometry being well-conditioned at the layer where they are applied.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes