LG AI CLNov 3, 2025

Regularization Through Reasoning: Systematic Improvements in Language Model Classification via Explanation-Enhanced Fine-Tuning

Vivswan Shah, Randy Cogill, Hanwei Yue, Gopinath Chennupati, Rinat Khaziev

arXiv:2511.02044v1h-index: 16

Originality Incremental advance

AI Analysis

This work addresses the problem of improving classification reliability in language models for conversational AI applications, though it appears to be an incremental advance in fine-tuning techniques.

The researchers investigated whether attaching brief explanations to labels during fine-tuning improves language model classification performance, finding that both genuine explanations and semantically incoherent pseudo-explanations enhanced accuracy across six conversational datasets, with pseudo-explanations narrowing much of the gap to true explanations.

Fine-tuning LLMs for classification typically maps inputs directly to labels. We ask whether attaching brief explanations to each label during fine-tuning yields better models. We evaluate conversational response quality along three axes: naturalness, comprehensiveness, and on-topic adherence, each rated on 5-point scales. Using ensemble-generated data from multiple LLMs, we fine-tune a 7B-parameter model and test across six diverse conversational datasets. Across 18 dataset, task settings, label-plus-explanation training outperforms label-only baselines. A central and unexpected result concerns random tokens. We replace human-written explanations with text that is syntactically incoherent yet vocabulary-aligned with the originals (e.g., shuffled or bag-of-words variants). Despite lacking semantics, these pseudo-explanations still improve accuracy over label-only training and often narrow much of the gap to true explanations. The effect persists across datasets and training seeds, indicating that gains arise less from meaning than from structure: the extra token budget encourages richer intermediate computation and acts as a regularizer that reduces over-confident shortcuts. Internal analyses support this view: explanation-augmented models exhibit higher activation entropy in intermediate layers alongside sharper predictive mass at the output layer, consistent with increased deliberation before decision. Overall, explanation-augmented fine-tuning, whether with genuine rationales or carefully constructed random token sequences, improves accuracy and reliability for LLM classification while clarifying how token-level scaffolding shapes computation during inference.

View on arXiv PDF

Similar