CLAug 7, 2022

Vernacular Search Query Translation with Unsupervised Domain Adaptation

arXiv:2208.03711v11 citationsh-index: 13
Originality Incremental advance
AI Analysis

This addresses the challenge of enabling cross-lingual information retrieval for diverse users on e-commerce platforms, though it is incremental as it builds on existing translation models with domain adaptation.

The paper tackles the problem of translating vernacular search queries for e-commerce without parallel training data by proposing an unsupervised domain adaptation approach, achieving over 20 BLEU points improvement over a baseline without parallel corpus and over 27 BLEU points with a small labeled set.

With the democratization of e-commerce platforms, an increasingly diversified user base is opting to shop online. To provide a comfortable and reliable shopping experience, it's important to enable users to interact with the platform in the language of their choice. An accurate query translation is essential for Cross-Lingual Information Retrieval (CLIR) with vernacular queries. Due to internet-scale operations, e-commerce platforms get millions of search queries every day. However, creating a parallel training set to train an in-domain translation model is cumbersome. This paper proposes an unsupervised domain adaptation approach to translate search queries without using any parallel corpus. We use an open-domain translation model (trained on public corpus) and adapt it to the query data using only the monolingual queries from two languages. In addition, fine-tuning with a small labeled set further improves the result. For demonstration, we show results for Hindi to English query translation and use mBART-large-50 model as the baseline to improve upon. Experimental results show that, without using any parallel corpus, we obtain more than 20 BLEU points improvement over the baseline while fine-tuning with a small 50k labeled set provides more than 27 BLEU points improvement over the baseline.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes