Comparing LLM prompting with Cross-lingual transfer performance on Indigenous and Low-resource Brazilian Languages
This addresses the problem of NLP performance disparities for low-resource and Indigenous languages, which is incremental as it compares existing methods on new data.
The study tackled the performance of Large Language Models on part-of-speech labeling for low-resource languages, finding that they perform worse compared to high-resource languages, with an error analysis provided to explain the failures.
Large Language Models are transforming NLP for a variety of tasks. However, how LLMs perform NLP tasks for low-resource languages (LRLs) is less explored. In line with the goals of the AmericasNLP workshop, we focus on 12 LRLs from Brazil, 2 LRLs from Africa and 2 high-resource languages (HRLs) (e.g., English and Brazilian Portuguese). Our results indicate that the LLMs perform worse for the part of speech (POS) labeling of LRLs in comparison to HRLs. We explain the reasons behind this failure and provide an error analysis through examples observed in our data set.