CLSep 24, 2024

Finetuning LLMs for Comparative Assessment Tasks

arXiv:2409.15979v119 citationsh-index: 13
Originality Incremental advance
AI Analysis

This addresses a computational bottleneck for researchers and practitioners in NLP, though it is incremental as it builds on existing comparative assessment methods.

The paper tackles the scalability issue of pairwise comparisons in automated assessment of natural language generation by finetuning LLMs to align with comparative probabilities, achieving state-of-the-art performance with efficient comparisons.

Automated assessment in natural language generation is a challenging task. Instruction-tuned large language models (LLMs) have shown promise in reference-free evaluation, particularly through comparative assessment. However, the quadratic computational complexity of pairwise comparisons limits its scalability. To address this, efficient comparative assessment has been explored by applying comparative strategies on zero-shot LLM probabilities. We propose a framework for finetuning LLMs for comparative assessment to align the model's output with the target distribution of comparative probabilities. By training on soft probabilities, our approach improves state-of-the-art performance while maintaining high performance with an efficient subset of comparisons.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes