Comparative Study of Zero-Shot Cross-Lingual Transfer for Bodo POS and NER Tagging Using Gemini 2.0 Flash Thinking Experimental Model
This work addresses the challenge of enabling NLP tasks like POS and NER tagging for low-resource languages such as Bodo, though it is incremental as it applies existing methods to a new language.
This study tackled the problem of limited NLP resources for the low-resource language Bodo by comparing two zero-shot cross-lingual transfer methods using Gemini 2.0 Flash Thinking Experiment for POS and NER tagging, finding that prompt-based transfer performed better, especially for NER.
Named Entity Recognition (NER) and Part-of-Speech (POS) tagging are critical tasks for Natural Language Processing (NLP), yet their availability for low-resource languages (LRLs) like Bodo remains limited. This article presents a comparative empirical study investigating the effectiveness of Google's Gemini 2.0 Flash Thinking Experiment model for zero-shot cross-lingual transfer of POS and NER tagging to Bodo. We explore two distinct methodologies: (1) direct translation of English sentences to Bodo followed by tag transfer, and (2) prompt-based tag transfer on parallel English-Bodo sentence pairs. Both methods leverage the machine translation and cross-lingual understanding capabilities of Gemini 2.0 Flash Thinking Experiment to project English POS and NER annotations onto Bodo text in CONLL-2003 format. Our findings reveal the capabilities and limitations of each approach, demonstrating that while both methods show promise for bootstrapping Bodo NLP, prompt-based transfer exhibits superior performance, particularly for NER. We provide a detailed analysis of the results, highlighting the impact of translation quality, grammatical divergences, and the inherent challenges of zero-shot cross-lingual transfer. The article concludes by discussing future research directions, emphasizing the need for hybrid approaches, few-shot fine-tuning, and the development of dedicated Bodo NLP resources to achieve high-accuracy POS and NER tagging for this low-resource language.