GeoLAN: Geometric Learning of Latent Explanatory Directions in Large Language Models
This work addresses interpretability issues in LLMs for researchers and practitioners, though it appears incremental as it builds on existing geometric concepts.
The authors tackled the lack of transparency in large language models by introducing GeoLAN, a training framework that treats token representations as geometric trajectories with stickiness conditions, resulting in maintained task accuracy while improving geometric metrics and reducing fairness biases, particularly in mid-sized models.
Large language models (LLMs) demonstrate strong performance, but they often lack transparency. We introduce GeoLAN, a training framework that treats token representations as geometric trajectories and applies stickiness conditions inspired by recent developments related to the Kakeya Conjecture. We have developed two differentiable regularizers, Katz-Tao Convex Wolff (KT-CW) and Katz-Tao Attention (KT-Attn), that promote isotropy and encourage diverse attention. Our experiments with Gemma-3 (1B, 4B, 12B) and Llama-3-8B show that GeoLAN frequently maintains task accuracy while improving geometric metrics and reducing certain fairness biases. These benefits are most significant in mid-sized models. Our findings reveal scale-dependent trade-offs between geometric precision and performance, suggesting that geometry-aware training is a promising approach to enhance mechanistic interpretability.