LGFeb 3, 2025

Harmonic Loss Trains Interpretable AI Models

arXiv:2502.01628v25 citationsh-index: 17Trans. Mach. Learn. Res.
Originality Incremental advance
AI Analysis

This addresses the need for more interpretable and data-efficient AI models, particularly in domains with limited data or high-stakes applications, though it appears incremental as a novel training method.

The paper tackles the problem of training neural networks and LLMs by introducing harmonic loss as an alternative supervisory signal, which improves interpretability and reduces data requirements for generalization, with experiments showing enhanced performance over standard models.

In this paper, we introduce harmonic loss as an alternative supervisory signal for training neural networks and large language models (LLMs). Harmonic loss differs from standard cross-entropy loss by (a) replacing the usual SoftMax normalization with a scale-invariant HarMax function and (b) computing logits via Euclidean distance rather than a dot product. Harmonic loss enables improved interpretability and faster convergence, owing to its scale invariance and finite convergence point by design, which can be interpreted as a class center. We first validate the performance of harmonic models across algorithmic, vision, and language datasets. Through extensive experiments, we demonstrate that models trained with harmonic loss perform better than standard models by: (a) enhancing interpretability, (b) requiring less data for generalization, and (c) reducing grokking. Moreover, we compare a GPT-2 model trained with harmonic loss to the standard GPT-2, illustrating that the harmonic model develops more interpretable representations. Looking forward, we believe harmonic loss may become a valuable tool in domains with limited data availability or in high-stakes applications where interpretability and reliability are paramount, paving the way for more robust and efficient neural network models.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes