CLFLLGOct 9, 2020

How Can Self-Attention Networks Recognize Dyck-n Languages?

arXiv:2010.04303v11001 citations
Originality Incremental advance
AI Analysis

This addresses the challenge of learning hierarchical structures in formal languages for natural language processing, though it is incremental as it builds on prior work on self-attention and language recognition.

The paper tackled the problem of recognizing Dyck-n languages with self-attention networks, showing that a variant with a starting symbol (SA+) achieves 58.82% accuracy on D2 for long sequences and generalizes better than a variant without it (SA-).

We focus on the recognition of Dyck-n ($\mathcal{D}_n$) languages with self-attention (SA) networks, which has been deemed to be a difficult task for these networks. We compare the performance of two variants of SA, one with a starting symbol (SA$^+$) and one without (SA$^-$). Our results show that SA$^+$ is able to generalize to longer sequences and deeper dependencies. For $\mathcal{D}_2$, we find that SA$^-$ completely breaks down on long sequences whereas the accuracy of SA$^+$ is 58.82$\%$. We find attention maps learned by $\text{SA}{^+}$ to be amenable to interpretation and compatible with a stack-based language recognizer. Surprisingly, the performance of SA networks is at par with LSTMs, which provides evidence on the ability of SA to learn hierarchies without recursion.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes