LGDec 10, 2025

Local LLM Ensembles for Zero-shot Portuguese Named Entity Recognition

arXiv:2512.10043v1h-index: 1Has Code
Originality Incremental advance
AI Analysis

This work addresses scalable, low-resource, and zero-shot NER for Portuguese, offering a practical solution for domains with limited annotated data.

The paper tackles the problem of Named Entity Recognition (NER) for lower-resource languages like Portuguese by proposing a three-step ensemble pipeline using locally run LLMs, achieving outperformance over individual LLMs in four out of five Portuguese NER datasets.

Large Language Models (LLMs) excel in many Natural Language Processing (NLP) tasks through in-context learning but often under-perform in Named Entity Recognition (NER), especially for lower-resource languages like Portuguese. While open-weight LLMs enable local deployment, no single model dominates all tasks, motivating ensemble approaches. However, existing LLM ensembles focus on text generation or classification, leaving NER under-explored. In this context, this work proposes a novel three-step ensemble pipeline for zero-shot NER using similarly capable, locally run LLMs. Our method outperforms individual LLMs in four out of five Portuguese NER datasets by leveraging a heuristic to select optimal model combinations with minimal annotated data. Moreover, we show that ensembles obtained on different source datasets generally outperform individual LLMs in cross-dataset configurations, potentially eliminating the need for annotated data for the current task. Our work advances scalable, low-resource, and zero-shot NER by effectively combining multiple small LLMs without fine-tuning. Code is available at https://github.com/Joao-Luz/local-llm-ner-ensemble.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes