Backretrieval: An Image-Pivoted Evaluation Metric for Cross-Lingual Text Representations Without Parallel Corpora
This addresses the difficulty of evaluating cross-lingual representations in domains beyond standard benchmarks, benefiting researchers and practitioners in fields like unsupervised machine translation and cross-lingual information retrieval, though it is incremental as it builds on existing evaluation methods.
The paper tackles the problem of evaluating cross-lingual text representations without parallel corpora by proposing Backretrieval, an automatic metric that uses images as a proxy, and shows it highly correlates with ground truth metrics and offers statistically significant improvements over baselines.
Cross-lingual text representations have gained popularity lately and act as the backbone of many tasks such as unsupervised machine translation and cross-lingual information retrieval, to name a few. However, evaluation of such representations is difficult in the domains beyond standard benchmarks due to the necessity of obtaining domain-specific parallel language data across different pairs of languages. In this paper, we propose an automatic metric for evaluating the quality of cross-lingual textual representations using images as a proxy in a paired image-text evaluation dataset. Experimentally, Backretrieval is shown to highly correlate with ground truth metrics on annotated datasets, and our analysis shows statistically significant improvements over baselines. Our experiments conclude with a case study on a recipe dataset without parallel cross-lingual data. We illustrate how to judge cross-lingual embedding quality with Backretrieval, and validate the outcome with a small human study.