Marianna J. Martindale

CL
h-index36
3papers
1,343citations
Novelty37%
AI Score40

3 Papers

CLJan 18, 2023
Understanding and Detecting Hallucinations in Neural Machine Translation via Model Introspection

Weijia Xu, Sweta Agrawal, Eleftheria Briakou et al.

Neural sequence generation models are known to "hallucinate", by producing outputs that are unrelated to the source text. These hallucinations are potentially harmful, yet it remains unclear in what conditions they arise and how to mitigate their impact. In this work, we first identify internal model symptoms of hallucinations by analyzing the relative token contributions to the generation in contrastive hallucinated vs. non-hallucinated outputs generated via source perturbations. We then show that these symptoms are reliable indicators of natural hallucinations, by using them to design a lightweight hallucination detector which outperforms both model-free baselines and strong classifiers based on quality estimation or large pre-trained models on manually annotated English-Chinese and German-English translation test beds.

CLOct 11, 2025
Toward Machine Translation Literacy: How Lay Users Perceive and Rely on Imperfect Translations

Yimin Xiao, Yongle Zhang, Dayeon Ki et al.

As Machine Translation (MT) becomes increasingly commonplace, understanding how the general public perceives and relies on imperfect MT is crucial for contextualizing MT research in real-world applications. We present a human study conducted in a public museum (n=452), investigating how fluency and adequacy errors impact bilingual and non-bilingual users' reliance on MT during casual use. Our findings reveal that non-bilingual users often over-rely on MT due to a lack of evaluation strategies and alternatives, while experiencing the impact of errors can prompt users to reassess future reliance. This highlights the need for MT evaluation and NLP explanation techniques to promote not only MT quality, but also MT literacy among its users.

CLFeb 16, 2018
Fluency Over Adequacy: A Pilot Study in Measuring User Trust in Imperfect MT

Marianna J. Martindale, Marine Carpuat

Although measuring intrinsic quality has been a key factor in the advancement of Machine Translation (MT), successfully deploying MT requires considering not just intrinsic quality but also the user experience, including aspects such as trust. This work introduces a method of studying how users modulate their trust in an MT system after seeing errorful (disfluent or inadequate) output amidst good (fluent and adequate) output. We conduct a survey to determine how users respond to good translations compared to translations that are either adequate but not fluent, or fluent but not adequate. In this pilot study, users responded strongly to disfluent translations, but were, surprisingly, much less concerned with adequacy.