CLJul 11, 2025

Diagnosing Failures in Large Language Models' Answers: Integrating Error Attribution into Evaluation Framework

arXiv:2507.08459v13 citationsh-index: 2ACL
Originality Incremental advance
AI Analysis

This work addresses the need for systematic error analysis in LLM evaluation, which is incremental as it builds on existing evaluation methods by adding attribution capabilities.

The authors tackled the problem of diagnosing failures in Large Language Models' answers by developing an automated framework for error attribution, resulting in a fine-tuned model that generates scores, misattribution categories, and feedback with demonstrated effectiveness.

With the widespread application of Large Language Models (LLMs) in various tasks, the mainstream LLM platforms generate massive user-model interactions daily. In order to efficiently analyze the performance of models and diagnose failures in their answers, it is essential to develop an automated framework to systematically categorize and attribute errors. However, existing evaluation models lack error attribution capability. In this work, we establish a comprehensive Misattribution Framework with 6 primary and 15 secondary categories to facilitate in-depth analysis. Based on this framework, we present AttriData, a dataset specifically designed for error attribution, encompassing misattribution, along with the corresponding scores and feedback. We also propose MisAttributionLLM, a fine-tuned model on AttriData, which is the first general-purpose judge model capable of simultaneously generating score, misattribution, and feedback. Extensive experiments and analyses are conducted to confirm the effectiveness and robustness of our proposed method.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes