CL AI IRApr 30, 2025

RDF-Based Structured Quality Assessment Representation of Multilingual LLM Evaluations

arXiv:2504.21605v12.7h-index: 4ESWC

Originality Synthesis-oriented

AI Analysis

This work addresses the challenge of evaluating LLM quality in multilingual settings with knowledge conflicts, but it is incremental as it applies a structured representation to a specific domain.

The authors tackled the problem of systematically assessing multilingual LLM reliability with conflicting information by proposing an RDF-based framework, and demonstrated it in a fire safety domain experiment, revealing patterns in context prioritization and language-specific performance across 28 questions.

Large Language Models (LLMs) increasingly serve as knowledge interfaces, yet systematically assessing their reliability with conflicting information remains difficult. We propose an RDF-based framework to assess multilingual LLM quality, focusing on knowledge conflicts. Our approach captures model responses across four distinct context conditions (complete, incomplete, conflicting, and no-context information) in German and English. This structured representation enables the comprehensive analysis of knowledge leakage-where models favor training data over provided context-error detection, and multilingual consistency. We demonstrate the framework through a fire safety domain experiment, revealing critical patterns in context prioritization and language-specific performance, and demonstrating that our vocabulary was sufficient to express every assessment facet encountered in the 28-question study.

View on arXiv PDF

Similar