CL LGMar 31, 2025

MKA: Leveraging Cross-Lingual Consensus for Model Abstention

arXiv:2503.23687v11 citationsh-index: 3Has Code

Originality Incremental advance

AI Analysis

This addresses the need for more factual and reliable LLMs for broader adoption, though it is incremental as it builds on existing multilingual capabilities.

The paper tackles the problem of improving LLM reliability by enabling them to abstain when uncertain, using a multilingual pipeline to calibrate confidence, resulting in accuracy improvements of 71.2% for Bengali and 15.5% for English over baselines.

Reliability of LLMs is questionable even as they get better at more tasks. A wider adoption of LLMs is contingent on whether they are usably factual. And if they are not, on whether they can properly calibrate their confidence in their responses. This work focuses on utilizing the multilingual knowledge of an LLM to inform its decision to abstain or answer when prompted. We develop a multilingual pipeline to calibrate the model's confidence and let it abstain when uncertain. We run several multilingual models through the pipeline to profile them across different languages. We find that the performance of the pipeline varies by model and language, but that in general they benefit from it. This is evidenced by the accuracy improvement of $71.2\%$ for Bengali over a baseline performance without the pipeline. Even a high-resource language like English sees a $15.5\%$ improvement. These results hint at possible further improvements.

View on arXiv PDF Code

Similar