CLAIMay 27, 2025

Automatic Transmission for LLM Tiers: Optimizing Cost and Accuracy in Large Language Models

arXiv:2505.20921v21 citationsh-index: 8ACL
Originality Incremental advance
AI Analysis

This addresses cost optimization for LLM users in NLP applications, though it is incremental as it builds on existing tiered model systems.

The paper tackles the challenge of selecting optimal LLM tiers for subtasks to balance cost and accuracy, introducing the LLM-AT framework that automatically chooses tiers without training, achieving superior performance and cost reduction in experiments.

LLM providers typically offer multiple LLM tiers, varying in performance and price. As NLP tasks become more complex and modularized, selecting the suitable LLM tier for each subtask is a key challenge to balance between cost and performance. To address the problem, we introduce LLM Automatic Transmission (LLM-AT) framework that automatically selects LLM tiers without training. LLM-AT consists of Starter, Generator, and Judge. The starter selects the initial LLM tier expected to solve the given question, the generator produces a response using the LLM of the selected tier, and the judge evaluates the validity of the response. If the response is invalid, LLM-AT iteratively upgrades to a higher-tier model, generates a new response, and re-evaluates until a valid response is obtained. Additionally, we propose accuracy estimator, which enables the suitable initial LLM tier selection without training. Given an input question, accuracy estimator estimates the expected accuracy of each LLM tier by computing the valid response rate across top-k similar queries from past inference records. Experiments demonstrate that LLM-AT achieves superior performance while reducing costs, making it a practical solution for real-world applications.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes