CLAIOct 15, 2025

Stable LLM Ensemble: Interaction between Example Representativeness and Diversity

arXiv:2510.13143v1h-index: 4
Originality Incremental advance
AI Analysis

This work addresses the problem of improving accuracy and robustness in one-shot LLM ensembles for practitioners, though it is incremental as it builds on existing ensemble and prompting methods.

The study tackled the sensitivity of one-shot LLM predictions to examples and ensemble diversity by investigating example representativeness and output diversity, finding that a centroid-based representative example selection with higher temperature outperformed random selection by +7.6% in macro-F1 and -10.5% in RMSE, and exceeded 5-shot prompting by +21.1% in macro-F1 and -24.0% in RMSE.

Large language models (LLMs) have achieved remarkable results in wide range of domains. However, the accuracy and robustness of one-shot LLM predictions remain highly sensitive to the examples and the diversity among ensemble members. This study systematically investigates the effects of example representativeness (one-shot strategy) and output diversity (sampling temperature) on LLM ensemble performance. Two one-shot strategies are compared: centroid-based representative examples (proposed) and randomly sampled examples (baseline) and sampling temperature also is varied. The proposed approach with higher temperature setting significantly outperforms random selection by +7.6% (macro-F1) and -10.5% (RMSE). Furthermore, the proposed model exceeds 5-shot prompting by +21.1% (macro-F1) and -24.0% (RMSE). Our findings demonstrate that combining representative example selection with increased temperature provides the appropriate level of diversity to the ensemble. This work highlights the practical importance of both example selection and controlled diversity in designing effective one-shot LLM ensembles.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes