NEMay 24, 2018

Hyperbolic Attention Networks

arXiv:1805.09786v1268 citations
Originality Incremental advance
AI Analysis

This work addresses the challenge of handling complex data structures in deep learning, offering a novel approach that is incremental by extending hyperbolic geometry from parameters to activations.

The authors tackled the problem of modeling hierarchical and power-law data by introducing hyperbolic attention networks, which re-express soft attention using hyperbolic geometry on activations, resulting in improved generalization on tasks like neural machine translation, graph learning, and visual question answering while keeping representations compact.

We introduce hyperbolic attention networks to endow neural networks with enough capacity to match the complexity of data with hierarchical and power-law structure. A few recent approaches have successfully demonstrated the benefits of imposing hyperbolic geometry on the parameters of shallow networks. We extend this line of work by imposing hyperbolic geometry on the activations of neural networks. This allows us to exploit hyperbolic geometry to reason about embeddings produced by deep networks. We achieve this by re-expressing the ubiquitous mechanism of soft attention in terms of operations defined for hyperboloid and Klein models. Our method shows improvements in terms of generalization on neural machine translation, learning on graphs and visual question answering tasks while keeping the neural representations compact.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes