Retrieval-Augmented Long-Context Translation for Cultural Image Captioning: Gators submission for AmericasNLP 2026 shared task

Aashish Dhawan, Christopher Driggers-Ellis, Dzmitry Kasinets, Daisy Zhe Wang, Christan Grant

arXiv:2605.2062616.3

Predicted impact top 71% in CL · last 90 daysOriginality Synthesis-oriented

AI Analysis

This work addresses the problem of generating culturally appropriate image captions for low-resource Indigenous languages, but the approach is incremental (combining existing vision-language models with retrieval augmentation) and the gains are specific to the shared task setting.

The authors tackle cultural image captioning for Indigenous languages (Bribri, Guaraní, Orizaba Nahuatl) using a two-stage pipeline with retrieval-augmented many-shot prompting. They achieve 122.6–164.1% improvements over the shared task baseline on dev sets and maintain >150% improvements for two languages on the test set, winning the shared task overall.

We present the University of Florida Gators submission to the AmericasNLP 2026 shared task on cultural image captioning for Indigenous languages. Our two-stage pipeline generates a Spanish intermediate caption with Qwen2.5-VL, then produces the target-language caption using retrieval-augmented many-shot prompting with Gemini 2.5 Flash. We achieve 164.1%, 131.7%, and 122.6% improvements over the shared task baseline for Bribri, Guaraní, and Orizaba Nahuatl captioning, respectively, in our dev set evaluation and maintain >150% improvements for the Bribri and Orizaba Nahuatl languages in the test set evaluation. We find retrieval is highly language-dependent, beneficial only for large, in-domain corpora, and that synthetic data augmentation accounts for around 28 chrF++ of the dev set Guaraní performance gain. Our submission is the overall winner of the shared task, placing second out of five finalist submissions in human evaluations of target-language captions.

View on arXiv PDF

Similar