AIDec 25, 2024

PRISM: Efficient Long-Range Reasoning With Short-Context LLMs

arXiv:2412.18914v33 citationsh-index: 17EMNLP
Originality Highly original
AI Analysis

This addresses the inefficiency of short-context LLMs for long-range tasks, offering a scalable solution for applications requiring token-efficient reasoning.

The paper tackles the problem of long-range reasoning with large language models by introducing PRISM, an in-context method that uses structured schemas to achieve 4x shorter contexts and reduce costs by up to 54% compared to baselines.

Long-range tasks demand reasoning over long inputs. However, existing solutions are limited, e.g., long-context models require large compute budgets, parameter-efficient fine-tuning (PEFT) needs training data, and retrieval-augmented generation (RAG) entails complex task-specific designs. Though in-context approaches overcome many of these issues, methods with short-context LLMs are inefficient, trading context for processing more tokens. We introduce PRISM, a highly token-efficient in-context method based on structured schemas that outperforms baselines on diverse tasks with 4x shorter contexts. This approach produces concise outputs and efficiently leverages key-value (KV) caches to reduce costs by up to 54%. PRISM scales down to tiny contexts without increasing costs or sacrificing quality, and generalizes to new tasks with minimal effort by generating schemas from task descriptions.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes