CLOct 30, 2025

Similarity-Distance-Magnitude Language Models

arXiv:2510.26183v1h-index: 1
Originality Incremental advance
AI Analysis

This work addresses the issue of abstentions in language models for AI applications, presenting an incremental improvement over existing methods.

The paper tackles the problem of reducing abstentions in language models by introducing Similarity-Distance-Magnitude (SDM) language models, which fine-tune pre-trained Transformers with a novel activation layer and contrastive training scheme, resulting in improved statistical efficiency compared to baselines.

We introduce Similarity-Distance-Magnitude (SDM) language models (LMs), which are sequence prediction models fine-tuned to maximize the proportion of generations in the well-calibrated, high-probability region partitioned by a final-layer SDM activation layer used for binary classification of instruction-following. We demonstrate that existing pre-trained decoder-only Transformer LMs can be readily converted into SDM LMs via supervised fine-tuning, using the final-layer SDM activation layer during training to estimate a change-of-base for a supervised next-token loss over a contrastive input encoding scheme, with additional hard negative examples generated online during training. This results in reduced abstentions (i.e., improved statistical efficiency) compared to strong supervised baselines.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes