CLAug 11, 2017

Automatic Identification of AltLexes using Monolingual Parallel Corpora

arXiv:1708.03541v11086 citations
Originality Incremental advance
AI Analysis

This addresses a specific bottleneck in discourse parsing for NLP researchers, but it is incremental as it builds on existing resources and corpora.

The paper tackled the problem of identifying discourse relations signaled by markers outside standard inventories (AltLexes) by proposing a method using parallel corpora and lexical resources, resulting in the automatic discovery of 91 AltLexes.

The automatic identification of discourse relations is still a challenging task in natural language processing. Discourse connectives, such as "since" or "but", are the most informative cues to identify explicit relations; however discourse parsers typically use a closed inventory of such connectives. As a result, discourse relations signaled by markers outside these inventories (i.e. AltLexes) are not detected as effectively. In this paper, we propose a novel method to leverage parallel corpora in text simplification and lexical resources to automatically identify alternative lexicalizations that signal discourse relation. When applied to the Simple Wikipedia and Newsela corpora along with WordNet and the PPDB, the method allowed the automatic discovery of 91 AltLexes.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes