CLAILGMay 15, 2023

Small Models are Valuable Plug-ins for Large Language Models

arXiv:2305.08848v187 citations
Originality Incremental advance
AI Analysis

This addresses the problem of efficiently enhancing LLM performance for researchers and practitioners with hardware constraints, though it appears incremental as it builds on existing in-context learning methods.

The paper tackles the challenge of tuning large language models (LLMs) with limited hardware and data by proposing Super In-Context Learning (SuperICL), which integrates locally fine-tuned smaller models with black-box LLMs to achieve superior performance on supervised tasks, improving beyond state-of-the-art fine-tuned models and addressing instability issues.

Large language models (LLMs) such as GPT-3 and GPT-4 are powerful but their weights are often publicly unavailable and their immense sizes make the models difficult to be tuned with common hardware. As a result, effectively tuning these models with large-scale supervised data can be challenging. As an alternative, In-Context Learning (ICL) can only use a small number of supervised examples due to context length limits. In this paper, we propose Super In-Context Learning (SuperICL) which allows black-box LLMs to work with locally fine-tuned smaller models, resulting in superior performance on supervised tasks. Our experiments demonstrate that SuperICL can improve performance beyond state-of-the-art fine-tuned models while addressing the instability problem of in-context learning. Furthermore, SuperICL can enhance the capabilities of smaller models, such as multilinguality and interpretability.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes