LGDec 10, 2025

Local LLM Ensembles for Zero-shot Portuguese Named Entity Recognition

João Lucas Luz Lima Sarcinelli, Diego Furtado Silva

arXiv:2512.10043v14.1h-index: 1Has Code

Originality Incremental advance

AI Analysis

This work addresses scalable, low-resource, and zero-shot NER for Portuguese, offering a practical solution for domains with limited annotated data.

The paper tackles the problem of Named Entity Recognition (NER) for lower-resource languages like Portuguese by proposing a three-step ensemble pipeline using locally run LLMs, achieving outperformance over individual LLMs in four out of five Portuguese NER datasets.

Large Language Models (LLMs) excel in many Natural Language Processing (NLP) tasks through in-context learning but often under-perform in Named Entity Recognition (NER), especially for lower-resource languages like Portuguese. While open-weight LLMs enable local deployment, no single model dominates all tasks, motivating ensemble approaches. However, existing LLM ensembles focus on text generation or classification, leaving NER under-explored. In this context, this work proposes a novel three-step ensemble pipeline for zero-shot NER using similarly capable, locally run LLMs. Our method outperforms individual LLMs in four out of five Portuguese NER datasets by leveraging a heuristic to select optimal model combinations with minimal annotated data. Moreover, we show that ensembles obtained on different source datasets generally outperform individual LLMs in cross-dataset configurations, potentially eliminating the need for annotated data for the current task. Our work advances scalable, low-resource, and zero-shot NER by effectively combining multiple small LLMs without fine-tuning. Code is available at https://github.com/Joao-Luz/local-llm-ner-ensemble.

View on arXiv PDF Code

Similar