CLAIMar 8, 2024

Tracing the Roots of Facts in Multilingual Language Models: Independent, Shared, and Transferred Knowledge

arXiv:2403.05189v1110 citationsh-index: 4EACL
Originality Synthesis-oriented
AI Analysis

This addresses the challenge of factual knowledge consistency in low-resource languages for multilingual AI applications, but it is incremental as it builds on existing probing methods.

The study investigated how multilingual language models (ML-LMs) acquire and represent factual knowledge, identifying three patterns—language-independent, cross-lingual shared, and transferred—using the mLAMA dataset and neuron analysis, but did not report specific numerical results.

Acquiring factual knowledge for language models (LMs) in low-resource languages poses a serious challenge, thus resorting to cross-lingual transfer in multilingual LMs (ML-LMs). In this study, we ask how ML-LMs acquire and represent factual knowledge. Using the multilingual factual knowledge probing dataset, mLAMA, we first conducted a neuron investigation of ML-LMs (specifically, multilingual BERT). We then traced the roots of facts back to the knowledge source (Wikipedia) to identify the ways in which ML-LMs acquire specific facts. We finally identified three patterns of acquiring and representing facts in ML-LMs: language-independent, cross-lingual shared and transferred, and devised methods for differentiating them. Our findings highlight the challenge of maintaining consistent factual knowledge across languages, underscoring the need for better fact representation learning in ML-LMs.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes