CLSep 4, 2019

Towards Better Modeling Hierarchical Structure for Self-Attention with Ordered Neurons

arXiv:1909.01562v21007 citations
AI Analysis

This work addresses the need for better modeling of hierarchical structure in neural networks for natural language processing, but it is incremental as it builds on existing hybrid models with a specific RNN variant.

The paper tackled the problem of understanding why hybrid self-attention and recurrent neural networks outperform individual architectures by proposing to enhance them with Ordered Neurons LSTM to better model hierarchical structure, resulting in improved performance on machine translation benchmarks and demonstrated benefits in linguistic evaluation and logical inference tasks.

Recent studies have shown that a hybrid of self-attention networks (SANs) and recurrent neural networks (RNNs) outperforms both individual architectures, while not much is known about why the hybrid models work. With the belief that modeling hierarchical structure is an essential complementary between SANs and RNNs, we propose to further enhance the strength of hybrid models with an advanced variant of RNNs - Ordered Neurons LSTM (ON-LSTM), which introduces a syntax-oriented inductive bias to perform tree-like composition. Experimental results on the benchmark machine translation task show that the proposed approach outperforms both individual architectures and a standard hybrid model. Further analyses on targeted linguistic evaluation and logical inference tasks demonstrate that the proposed approach indeed benefits from a better modeling of hierarchical structure.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes