CLJun 2

Language Models Compare Quantities Using Number-specific and Unit-specific Heuristics

arXiv:2606.0398222.4
Predicted impact top 63% in CL · last 90 daysOriginality Incremental advance
AI Analysis

For researchers in NLP and AI, this work reveals a fundamental limitation in how LMs handle numerical reasoning with units, showing they use approximate heuristics instead of precise conversion.

The paper investigates how language models compare quantities with measurement units, finding that accuracy degrades near comparison boundaries and that models rely on heuristics over numerals and units rather than exact conversion.

Quantities with measurement units, such as 110 cm and 1.2 m, require language models (LMs) to combine a numeral with a symbolic unit scale. Here, we study how LMs compare such quantities in controlled settings spanning several unit systems. We find that accuracy degrades near the comparison boundary, where small changes in value determine the correct answer. The resulting errors are systematic: linear surrogate models predict LM preferences from numerical-difference and unit-scale-difference cues, and causal interventions on subspaces aligned with these variables shift model's output. The results suggest that LMs compare quantities through a bag of heuristics over numerals and units, rather than first converting both expressions to an exact shared-scale representation.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes