AIDec 25, 2024

PRISM: Efficient Long-Range Reasoning With Short-Context LLMs

Dulhan Jayalath, James Bradley Wendt, Nicholas Monath, Sandeep Tata, Beliz Gunel

arXiv:2412.18914v37.34 citationsh-index: 17EMNLP

Originality Highly original

AI Analysis

This addresses the inefficiency of short-context LLMs for long-range tasks, offering a scalable solution for applications requiring token-efficient reasoning.

The paper tackles the problem of long-range reasoning with large language models by introducing PRISM, an in-context method that uses structured schemas to achieve 4x shorter contexts and reduce costs by up to 54% compared to baselines.

Long-range tasks demand reasoning over long inputs. However, existing solutions are limited, e.g., long-context models require large compute budgets, parameter-efficient fine-tuning (PEFT) needs training data, and retrieval-augmented generation (RAG) entails complex task-specific designs. Though in-context approaches overcome many of these issues, methods with short-context LLMs are inefficient, trading context for processing more tokens. We introduce PRISM, a highly token-efficient in-context method based on structured schemas that outperforms baselines on diverse tasks with 4x shorter contexts. This approach produces concise outputs and efficiently leverages key-value (KV) caches to reduce costs by up to 54%. PRISM scales down to tiny contexts without increasing costs or sacrificing quality, and generalizes to new tasks with minimal effort by generating schemas from task descriptions.

View on arXiv PDF

Similar