CLNov 16, 2023

OrchestraLLM: Efficient Orchestration of Language Models for Dialogue State Tracking

Microsoft
arXiv:2311.09758v338 citationsh-index: 27
Originality Incremental advance
AI Analysis

This work addresses efficiency for NLP practitioners by reducing costs in dialogue systems, but it is incremental as it builds on prior SLM/LLM routing approaches.

The paper tackles the high computational cost of large language models (LLMs) in dialogue state tracking by proposing a routing framework that uses small language models (SLMs) and LLMs based on complementary strengths, resulting in enhanced performance and over 50% reduction in computational costs.

Large language models (LLMs) have revolutionized the landscape of Natural Language Processing systems, but are computationally expensive. To reduce the cost without sacrificing performance, previous studies have explored various approaches to harness the potential of Small Language Models (SLMs) as cost-effective alternatives to their larger counterparts. Driven by findings that SLMs and LLMs exhibit complementary strengths in a structured knowledge extraction task, this work presents a novel SLM/LLM routing framework designed to improve computational efficiency and enhance task performance. First, exemplar pools are created to represent the types of contexts where each LM provides a more reliable answer, leveraging a sentence embedding fine-tuned so that context similarity is close to dialogue state similarity. Then, during inference, the k-nearest exemplars to the testing instance are retrieved, and the instance is routed according to majority vote. In dialogue state tracking tasks, the proposed routing framework enhances performance substantially compared to relying solely on LLMs, while reducing the computational costs by over 50%.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes