Ieva Raminta Staliūnaitė

h-index9
2papers

2 Papers

30.1CLJun 1
The Role of Ambiguity in Error Prediction via Uncertainty Quantification

Ieva Raminta Staliūnaitė, James Bishop, Andreas Vlachos

The task of Error Prediction, namely predicting whether a model output is correct, is commonly tackled with Uncertainty Quantification (UQ). However, while uncertainty metrics capture when models lack knowledge or capacity to make a prediction, they also reflect aleatoric uncertainty, which is inherent in the model input and context. This paper presents a method for improving error prediction for Large Language Models (LLMs), by disentangling input ambiguity from UQ signal. We conduct experiments on the task of Question Answering (QA) with six UQ metrics and show that UQ metrics are more predictive of errors on unambiguous instances than on questions with multiple plausible answers. We use Gated Experts and Selective Prediction to incorporate gold and predicted ambiguity labels into the error prediction pipeline. We find that ambiguity information improves error prediction scores across model families, training and evaluation paradigms, datasets (including allegedly unambiguous ones), and sources of aleatoric uncertainty, yielding improvements of over 10 points of PRR for individual UQ metrics on standard datasets.

CLJul 24, 2025
Uncertainty Quantification for Evaluating Machine Translation Bias

Ieva Raminta Staliūnaitė, Julius Cheng, Andreas Vlachos

The predictive uncertainty of machine translation (MT) models is typically used as a quality estimation proxy. In this work, we posit that apart from confidently translating when a single correct translation exists, models should also maintain uncertainty when the input is ambiguous. We use uncertainty to measure gender bias in MT systems. When the source sentence includes a lexeme whose gender is not overtly marked, but whose target-language equivalent requires gender specification, the model must infer the appropriate gender from the context and can be susceptible to biases. Prior work measured bias via gender accuracy, however it cannot be applied to ambiguous cases. Using semantic uncertainty, we are able to assess bias when translating both ambiguous and unambiguous source sentences, and find that high translation accuracy does not correlate with exhibiting uncertainty appropriately, and that debiasing affects the two cases differently.