LGAIMay 11

Block-Based Double Decoders

arXiv:2605.1880740.5
Predicted impact top 62% in LG · last 90 daysOriginality Highly original
AI Analysis

This work addresses the training inefficiency of encoder-decoder models for practitioners seeking fast inference without sacrificing training scalability.

Block-based double decoders achieve decoder-only training efficiency and encoder-decoder inference efficiency, outperforming encoder-decoders in scaling laws and cutting KV-cache memory and per-token compute by at least 2/3.

Encoder-decoder models offer substantial inference-time savings over decoder-only models, but their pretraining objectives suffer from sparse supervision and dynamic sequence lengths, keeping them out of practice at scale. We propose block-based double decoders, a novel transformer architecture that utilizes doubly-causal block-based attention masks to train with full loss supervision and static sequence packing, combining decoder-only training efficiency with encoder-decoder inference efficiency. In scaling law experiments, block-based double decoders strongly outperform encoder-decoders and closely track decoder-only models across scales. At inference time, they cut KV-cache memory and per-token compute by at least 2/3 without sacrificing prefill caching or other existing inference optimizations available to decoder-only models.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes