Examining Multilingual Embedding Models Cross-Lingually Through LLM-Generated Adversarial Examples
This work addresses the limitation in evaluating multilingual embedding models for researchers and practitioners, but it is incremental as it builds on existing evaluation methods with a new task.
The paper tackles the problem of evaluating cross-lingual semantic search models by introducing CLSD, a lightweight task using LLM-generated adversarial examples, and finds that retrieval models benefit from English pivoting while bitext mining models excel in direct cross-lingual settings.
The evaluation of cross-lingual semantic search models is often limited to existing datasets from tasks such as information retrieval and semantic textual similarity. We introduce Cross-Lingual Semantic Discrimination (CLSD), a lightweight evaluation task that requires only parallel sentences and a Large Language Model (LLM) to generate adversarial distractors. CLSD measures an embedding model's ability to rank the true parallel sentence above semantically misleading but lexically similar alternatives. As a case study, we construct CLSD datasets for German--French in the news domain. Our experiments show that models fine-tuned for retrieval tasks benefit from pivoting through English, whereas bitext mining models perform best in direct cross-lingual settings. A fine-grained similarity analysis further reveals that embedding models differ in their sensitivity to linguistic perturbations. We release our code and datasets under AGPL-3.0: https://github.com/impresso/cross_lingual_semantic_discrimination