Method Drift›Long-context / context-window extension
Quest
Quest: Query-Aware Sparsity for Efficient Long-Context LLM InferenceLong-context / context-window extension · first seen Jun 16, 2024
superseded — cited as a baseline and beaten by newer methods
2 papers critique it · 2 beat it on benchmarks
What papers say
Verbatim critique sentences, each from a paper that cites Quest as a baseline.
“Despite the relatively low overhead, Quest lacks sophisticated design in the retrieval strategy, thus suffers from noticeable performance degradation.”
— A$^2$ATS: Retrieval-Based KV Cache Reduction via Windowed Rotary Position Embedding and Query-Aware Vector Quantization“Quest~tang2024quest maintains high accuracy but at the cost of substantial memory usage due to the need to cache the entire KV cache, eventually leading to OOM on long sequences.”
— LaCache: Ladder-Shaped KV Caching for Efficient Long-Context Modeling of Large Language Models
Beaten on benchmarks
Head-to-head results where a newer method reports beating Quest. Values are copied from the source paper's tables — verify against the cited paper.
- A$^2$ATS: Retrieval-Based KV Cache Reduction via Windowed Rotary Position Embedding and Query-Aware Vector Quantization
A2ATS beats Quest · Accuracy [Llama-3.1-8B-Instruct, Sparsity ~0.060]
86.6 vs 80.7
- A$^2$ATS: Retrieval-Based KV Cache Reduction via Windowed Rotary Position Embedding and Query-Aware Vector Quantization
A2ATS beats Quest · Accuracy [MegaBeam-Mistral-7B-512K, Sparsity ~0.062]
86.3 vs 78.4
- Long-Context Modeling with Dynamic Hierarchical Sparse Attention for On-Device LLMs
DHSA beats Quest · Avg. [Llama-3.1-8B-Instruct (4-bit)]
31.8 vs 30.9
- Long-Context Modeling with Dynamic Hierarchical Sparse Attention for On-Device LLMs
DHSA beats Quest · 32K [32K tokens]
76.2 vs 73.9
- Long-Context Modeling with Dynamic Hierarchical Sparse Attention for On-Device LLMs
DHSA beats Quest · 48K [48K tokens]
71.5 vs 68.2
Newer alternatives
Recent methods in the same sub-problem, not yet superseded in the knowledge base.