Budget-Aware Routing for Long Clinical Text
For practitioners deploying LLMs in clinical settings, this work provides a practical framework to reduce costs and latency while maintaining quality, though the findings are incremental and domain-specific.
The paper tackles the problem of high token costs for large language models processing long clinical texts, proposing a budget-aware routing method (RCD) that selects a subset of document units under a strict token budget. Experiments show that optimal selection strategies depend on the budget and task, with positional heuristics best at low budgets for extractive tasks and diversity-aware methods improving LLM generation.
A key challenge for large language models is token cost per query and overall deployment cost. Clinical inputs are long, heterogeneous, and often redundant, while downstream tasks are short and high stakes. We study budgeted context selection, where a subset of document units is chosen under a strict token budget so an off-the-shelf generator can meet fixed cost and latency constraints. We cast this as a knapsack-constrained subset selection problem with two design choices, unitization that defines document segmentation and selection that determines which units are kept. We propose \textbf{RCD}, a monotone submodular objective that balances relevance, coverage, and diversity. We compare sentence, section, window, and cluster-based unitization, and introduce a routing heuristic that adapts to the budget regime. Experiments on MIMIC discharge notes, Cochrane abstracts, and L-Eval show that optimal strategies depend on the evaluation setting. Positional heuristics perform best at low budgets in extractive tasks, while diversity-aware methods such as MMR improve LLM generation. Selector choice matters more than unitization, with cluster-based grouping reducing performance and other schemes behaving similarly. ROUGE saturates for LLM summaries, while BERTScore better reflects quality differences. We release our code at https://github.com/stone-technologies/ACL_budget_paper.