Loki (KV-cache compression): superseded — cited as a baseline and beaten by newer methods. 2 paper(s) critique it, 2 beat it on benchmarks — #45 of 234 most-superseded. Sub-problem: cluster led by Palu. Newer alternatives in the same sub-problem include ArborKV, RDKV, EchoKV, VQKV, Self-Indexing KVCache.

Superseded baseline#45 of 234 most-superseded

Loki

Loki: Low-rank Keys for Efficient Sparse Attention

KV-cache compression · first seen Jun 4, 2024

superseded — cited as a baseline and beaten by newer methods

2 papers critique it · 2 beat it on benchmarks

What papers say

Verbatim critique sentences, each from a paper that cites Loki as a baseline.

“Approaches like SparQ Attention ribar2024sparqattentionbandwidthefficientllm, AQUA Attention s2025aquaattentionquerymagnitudes, and Loki singhania2024lokilowrankkeysefficient prioritize computational savings over memory reduction.”
— SWAN: Sparse Winnowed Attention for Reduced Inference Memory via Decompression-Free KV-Cache Compression
“Another representative work is Loki~singhania2024loki, which performs the PCA transformation of KV cache and then selects % tokens based on attention scores computed in low-dimensional space for sparse attention. However, this method still suffers from the increasing KV cache size.”
— SALS: Sparse Attention in Latent Space for KV cache Compression

Beaten on benchmarks

Head-to-head results where a newer method reports beating Loki. Values are copied from the source paper's tables — verify against the cited paper.

Sys beats Loki · Avg (RULER) [Llama-3-8B-1M]
86.88 vs 9.33
ShadowKV: KV Cache in Shadows for High-Throughput Long-Context LLM Inference
Sys beats Loki · Avg (LongBench) [Llama-3-8B-1M]
39.94 vs 15.78
ShadowKV: KV Cache in Shadows for High-Throughput Long-Context LLM Inference
Sys beats Loki · Avg (RULER) [GLM-4-9B-1M]
85.62 vs 28.57
ShadowKV: KV Cache in Shadows for High-Throughput Long-Context LLM Inference
Sys beats Loki · Avg (LongBench) [GLM-4-9B-1M]
47.89 vs 32.35
ShadowKV: KV Cache in Shadows for High-Throughput Long-Context LLM Inference
Sys beats Loki · Avg (RULER) [Llama-3.1-8B]
83.57 vs 35.52
ShadowKV: KV Cache in Shadows for High-Throughput Long-Context LLM Inference
Sys beats Loki · Avg (LongBench) [Llama-3.1-8B]
48.13 vs 27.02
ShadowKV: KV Cache in Shadows for High-Throughput Long-Context LLM Inference
SALS beats Loki · Avg [LongBench tasks, token sparse methods]
32.26 vs 31.95
SALS: Sparse Attention in Latent Space for KV cache Compression

Newer alternatives

Recent methods in the same sub-problem, not yet superseded in the knowledge base.