LGAICLMLOct 23, 2018

Area Attention

arXiv:1810.10126v726 citationsHas Code
Originality Incremental advance
AI Analysis

This addresses the need for more flexible attention in models for tasks like translation and captioning, though it is an incremental improvement over existing attention mechanisms.

The paper tackles the problem of attention mechanisms being limited to fixed granularity by proposing area attention, which dynamically learns to attend to groups of adjacent items in memory, improving state-of-the-art baselines in neural machine translation and image captioning.

Existing attention mechanisms are trained to attend to individual items in a collection (the memory) with a predefined, fixed granularity, e.g., a word token or an image grid. We propose area attention: a way to attend to areas in the memory, where each area contains a group of items that are structurally adjacent, e.g., spatially for a 2D memory such as images, or temporally for a 1D memory such as natural language sentences. Importantly, the shape and the size of an area are dynamically determined via learning, which enables a model to attend to information with varying granularity. Area attention can easily work with existing model architectures such as multi-head attention for simultaneously attending to multiple areas in the memory. We evaluate area attention on two tasks: neural machine translation (both character and token-level) and image captioning, and improve upon strong (state-of-the-art) baselines in all the cases. These improvements are obtainable with a basic form of area attention that is parameter free.

Code Implementations2 repos
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes