CLMar 19, 2024

Efficient Encoder-Decoder Transformer Decoding for Decomposable Tasks

arXiv:2403.13112v33 citations
Originality Incremental advance
AI Analysis

This addresses efficiency limitations for deploying NLP models in specialized domains, though it is incremental as it builds on existing encoder-decoder architectures.

The paper tackles the high computational cost of transformer-based NLP models by introducing a new encoder-decoder configuration that improves efficiency for structured output and decomposable tasks, achieving up to 4.6x speed-up with comparable or better performance.

Transformer-based NLP models are powerful but have high computational costs that limit deployment. Finetuned encoder-decoder models are popular in specialized domains and can outperform larger more generalized decoder-only models, such as GPT-4. We introduce a new configuration for encoder-decoder models that improves efficiency on structured output and decomposable tasks where multiple outputs are required for a single shared input. Our method, prompt-in-decoder (PiD), encodes the input once and decodes the output in parallel, boosting both training and inference efficiency by avoiding duplicate input encoding and increasing the operational intensity (ratio of numbers of arithmetic operation to memory access) of decoding process by sharing the input key-value cache. We achieve computation reduction that roughly scales with the number of subtasks, gaining up to 4.6x speed-up over state-of-the-art models for dialogue state tracking, summarization, and question-answering tasks, with comparable or better performance.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes