CLLGMLDec 23, 2014

Grammar as a Foreign Language

arXiv:1412.7449v3940 citations
Originality Highly original
AI Analysis

This addresses the problem of domain-specific, complex, and inefficient parsers in natural language processing, offering a domain-agnostic and data-efficient solution.

The paper tackled syntactic constituency parsing by showing that an attention-enhanced sequence-to-sequence model achieves state-of-the-art results on a widely used dataset when trained on a large synthetic corpus, and matches standard parser performance with a small human-annotated dataset, processing over a hundred sentences per second.

Syntactic constituency parsing is a fundamental problem in natural language processing and has been the subject of intensive research and engineering for decades. As a result, the most accurate parsers are domain specific, complex, and inefficient. In this paper we show that the domain agnostic attention-enhanced sequence-to-sequence model achieves state-of-the-art results on the most widely used syntactic constituency parsing dataset, when trained on a large synthetic corpus that was annotated using existing parsers. It also matches the performance of standard parsers when trained only on a small human-annotated dataset, which shows that this model is highly data-efficient, in contrast to sequence-to-sequence models without the attention mechanism. Our parser is also fast, processing over a hundred sentences per second with an unoptimized CPU implementation.

Code Implementations7 repos

Data from Papers with Code (CC-BY-SA-4.0)

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes