CLMar 6, 2025

TRACT: Regression-Aware Fine-tuning Meets Chain-of-Thought Reasoning for LLM-as-a-Judge

arXiv:2503.04381v220 citationsh-index: 7ACL
Originality Incremental advance
AI Analysis

This work addresses the challenge of accurate numerical score prediction in LLM-as-a-judge systems, which is important for researchers and practitioners in NLP, though it appears incremental by building on prior regression-aware and CoT techniques.

The paper tackles the problem of LLM-as-a-judge for automated text evaluation by proposing TRACT, a method that combines chain-of-thought reasoning with regression-aware fine-tuning, resulting in significant performance improvements over existing methods across four datasets and two LLMs.

The LLM-as-a-judge paradigm uses large language models (LLMs) for automated text evaluation, where a numerical assessment is assigned by an LLM to the input text following scoring rubrics. Existing methods for LLM-as-a-judge use cross-entropy (CE) loss for fine-tuning, which neglects the numeric nature of score prediction. Recent work addresses numerical prediction limitations of LLM fine-tuning through regression-aware fine-tuning, which, however, does not consider chain-of-thought (CoT) reasoning for score prediction. In this paper, we introduce TRACT (Two-stage Regression-Aware fine-tuning with CoT), a method combining CoT reasoning with regression-aware training. TRACT consists of two stages: first, seed LLM is fine-tuned to generate CoTs, which serve as supervision for the second stage fine-tuning. The training objective of TRACT combines the CE loss for learning the CoT reasoning capabilities, and the regression-aware loss for the score prediction. Experiments across four LLM-as-a-judge datasets and two LLMs show that TRACT significantly outperforms existing methods. Extensive ablation studies validate the importance of each component in TRACT.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes