CLJun 2

Language Models Compare Quantities Using Number-specific and Unit-specific Heuristics

Mutsumi Sasaki, Go kamoda, Ryosuke Takahashi, Kosuke Sato, Kentaro Inui, Keisuke Sakaguchi, Benjamin Heinzerling

arXiv:2606.0398222.4

Predicted impact top 63% in CL · last 90 daysOriginality Incremental advance

AI Analysis

For researchers in NLP and AI, this work reveals a fundamental limitation in how LMs handle numerical reasoning with units, showing they use approximate heuristics instead of precise conversion.

The paper investigates how language models compare quantities with measurement units, finding that accuracy degrades near comparison boundaries and that models rely on heuristics over numerals and units rather than exact conversion.

Quantities with measurement units, such as 110 cm and 1.2 m, require language models (LMs) to combine a numeral with a symbolic unit scale. Here, we study how LMs compare such quantities in controlled settings spanning several unit systems. We find that accuracy degrades near the comparison boundary, where small changes in value determine the correct answer. The resulting errors are systematic: linear surrogate models predict LM preferences from numerical-difference and unit-scale-difference cues, and causal interventions on subspaces aligned with these variables shift model's output. The results suggest that LMs compare quantities through a bag of heuristics over numerals and units, rather than first converting both expressions to an exact shared-scale representation.

View on arXiv PDF

Similar