CLLGJun 19, 2024

On the Utility of Domain-Adjacent Fine-Tuned Model Ensembles for Few-shot Problems

arXiv:2406.13720v2
Originality Incremental advance
AI Analysis

This addresses the challenge of data scarcity in domain-specific applications for practitioners using large language models, though it is incremental as it builds on existing fine-tuning and ensembling techniques.

The paper tackles the problem of few-shot learning when domain-specific data is scarce by proposing DAFT-E, a framework that ensembles domain-adjacent fine-tuned models, showing it achieves accuracy close to the single best model in zero-shot settings and outperforms any single model in few-shot settings with less data.

Large Language Models (LLMs) have been observed to perform well on a wide range of downstream tasks when fine-tuned on domain-specific data. However, such data may not be readily available in many applications, motivating zero-shot or few-shot approaches using domain-adjacent models. While several fine-tuned models for various tasks are available, finding an appropriate domain-adjacent model for a given task is often not straight forward. In this paper, we study DAFT-E, a framework that utilizes an Ensemble of Domain-Adjacent Fine-Tuned Foundation Models for few-shot problems. We show that for zero-shot problems, this ensembling method provides an accuracy performance close to that of the single best model. With few-shot problems, this performance improves further, at which point DEFT-E can outperform any single domain-adjacent model while requiring much less data for domain-specific fine-tuning.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes